S'abonner au fil des news (flux RSS)
Beginning at 14:30 today :
EDIT:20180115@17:05
We experienced our first power blackout today (~1mn poweroff). As there is currently no active supervision in the new datacenter (still in testing mode), we are… slightly in the dark.
WE ARE IN TESTING MODE. REDUCED PRODUCTION. FULL ACCESS.
Two security breaches have been announced last week : Spectre and Meltdown.
We need to reboot each server after upgrade, as it's a kernel upgrade.
The glusterfs filesystem (that serve the /scratch on clusters) is beyond repair. There was so many missuses, issues plus errors from previous system that the new autoheal/autorepair included in the new software version cannot do much.
After the analysis of ~6 485 000 000 files (yep, billions), it has find more than 440 000 files in errors or split-brain that cannot be auto-heal nor auto-repair.
As a consequence, the E5 cluster will be restarted with an empty /scratch on next monday.
All data will be lost in both /scratch (E5 and x55), if you need these data, and you can access it, copy them before next monday.
WE ARE IN TESTING MODE. REDUCED PRODUCTION. FULL ACCESS.
Please report any problems/questions with the appropriate form.
WE ARE IN TESTING MODE. REDUCED PRODUCTION. REDUCED ACCESS.
Today's menu:
allo-psmn.psmn.ens-lyon.fr
(the new one) is up & running,qstat -g c
),