S'abonner au fil des news (flux RSS)
When you login to a node, a MOTD is displayed (Message Of The Day). Please pay attention to it and don't waste my time and your time.
cl6226comp1 is undergoing maintenance, DO NOT USE IT.
EDIT: Maintenance is done, cl6226comp1 can be used again.
We are upgrading the gateway server for iRods. Expect non-working connection for the day.
EIDT: upgrade done. all services working. For users of this service, please read the updated documentation in /data/psmn/
, about the configuration file format (changed).
While the crew was half-brained by COVID, nobody thought to verify the little SubnetManager daemon that was OFF…
Cascade is back to NORMINAL state.
We have a problem on 2 servers for scratches on Cascade cluster : one from /scratch/Cral
, one from /scratch/Cascade
.
They both have a dead infiniband network card. We are waiting for resupply to repair.
Symptoms: Files and/or directories are not available from both /scratch/Cral
or /scratch/Cascade
.
EDIT: We find out, both infiniband cables are dead.
Something went wrong, don't now what yet. On it.
EDIT: Master switch rebooted unexpectidely, killing all network connections between nodes and SIDUS-master ('/' on nodes) → general reboot (in progress) of all nodes, comp & visu.
all running jobs are lost.
EDIT2: expect some delay before everything back to normal…
EDIT3: except a few nodes, back to norminal.
EDIT4: WATCH YOUR JOBS! a large bunch of jobs have been “REQUEUE” by slurm. It may result in “unexpected behaviors”.