S'abonner au fil des news (flux RSS)
We are encountering strange network behaviors, and nodes are crashing one after another.
We might need to perform a global reboot of all nodes…
EDIT [09:50]
One of our main NFS server (/applis/PSMN) was stuck in a loop since yesterday evening, blocking all / access.
Things should be back to normal (no global reboot \o/). Jobs may have been blocked doing nothing all night.
Lake-flix and Cascade-flix are open to everybody, for short duration (no longer than 2 days is best, but standard walltime apply) small parallel and sequential jobs, with requeue in case of high priority jobs (see documentation)
example:
#SBATCH --partition=Lake,Lake-flix # or #SBATCH --partition=Cascade,Cascade-flix
The /Xnfs/abc
volume will be moved to a new server Thursday 16th of November, in the morning.
Any nf (NextFlow) running at that time might need a restart if crashed.
A disk on data8 (main CRAL fileserver) was… not well. After a good hammer blow, all export were restarted.
homes and exports may have been unavailable for a few moments.