S'abonner au fil des news (flux RSS)
An electrical power outage for maintenance purpose will be held Saturday 18th of October 2025 (12 hours long maintenance).
After “moult aventures”, a new scratch/Lake is coming online (new hardware planned for October 2025).
Expect full availability in a few tens of minutes (mounting on each node, one after another)
PSMN will be resuming to normal operation in a moment.
EDIT 15:00: Call me paranoïd. Temperatures are rising again . Clusters are at minimal frequencies, partitions half-open.
Cooling system is down, and it is worst than the last time. Idle nodes have been drain and poweroff.
Working nodes might follow, if the situation doesn't evolve well.
EDIT 16h30: all partitions DOWN, 230+ nodes powered OFF. All running jobs continue, at minimal frequencies. No starting jobs.
EDIT 21h30: T°C back to normal with current reduced load. PSMN stay in DRAIN mode until further notice.
EDIT morning: I forgot → half login nodes and visu nodes powered OFF (no need for residual heat).
Bad news: scratch/Lake is dead. Definitely. (5 disks dead over 12 during my vacation). Data are lost, rebuild is impossible.
Fortunately, the new hardware we plan to put in production in October has been delivered. A new scratch/Lake, with better performances, will be available in a few weeks.
Good news: UPS is back online and cramicule is finished, frequencies will resume tonight (minimum frequency by day, maximum frequency at night).