S'abonner au fil des news (flux RSS)

Fil des news

20220331 / Cascade power outage

A S92 chassis burn its power supply unit, making the main power unit to trip.

S92node[01-04,09-12] went down, including jobs…

2022/03/31 12:36 · ltaulell

20220318 / Upgrade news

  • login nodes (for debian11 tests and builds)
    • x5570comp[1-2] (→ minimum build tune: -mtune=generic -O2 -msse4a)
    • E5 partition: c82gluster1
    • Lake partition: c6420node171
    • Cascade partition: s92node01
  • We had to give back epycomp1 (epyc login node)

Use sinfo and squeue to see what resources are available. Use web forms for slurm access

2022/03/18 08:18 · ltaulell

20220317 / small power outage

A power distribution unit went down this afternoon, on Lake cluster. Some jobs have crashed. Sorry.

2022/03/17 16:31 · ltaulell

20220314 / Massive Upgrade coming

Hi all,

We have a Massive Upgrade coming this spring!

  • From debian 9 to debian 11
    • to support both new hardwares and new softwares
    • new cluster with new upgrades already available (Cascade)
    • every binary must be rebuild
    • new ways of sharing programs
    • new ways of installations
  Use the web forms when software are missing. We will find the most suitable way for installation(s).
  • See below new documentation about R, Python, Perl…
  • From SGE to Slurm
    • GridEngine have served us well for years, but it can't keep up as we continue to grow (over 30k cores this year),
    • We choose slurm because it is used on national and international centers
    • better reservation, cleaner exit
    • Slurm offer more freedom and more responsabilities!
  • We will have to physically re-organize clusters (E5 and Cascade), to double Cascade, and halve E5.
  • We will propose meetings and courses to ease the transition, group by groups
  • Cons:
    • all scratchs will have to be destroyed (we go from v3 to v8, data will not be readable, we must restart from zero)
    • E5 and Cascade will have to be turned off for weeks (2 to 4)
2022/03/14 09:42 · ltaulell

20220121 / Network problems

We are experiencing large network problems, origin unknown. Everything is slow or not responding.

EDIT 12:20: things “seems” to be back to normal. Origin still uncertain.

2022/01/21 11:07 · ltaulell
news/blog.txt · Dernière modification : 2020/08/25 15:58 de 127.0.0.1