S'abonner au fil des news (flux RSS)

Fil des news

20231211 / Cascade partially down

A large bunch of Cascade nodes went down (16h15+), probably due to a power spike (large jobs aren't good).

Problem will be handled tomorrow, as no one is on site today.

EDIT 12/12/2023: 4 PSU died, with cascading effects on both nodes and network. Back to norminal.

2023/12/11 16:18 · ltaulell

20231206 / global breakdown

We are encountering strange network behaviors, and nodes are crashing one after another.

We might need to perform a global reboot of all nodes…

EDIT [09:50]

One of our main NFS server (/applis/PSMN) was stuck in a loop since yesterday evening, blocking all / access.

Things should be back to normal (no global reboot \o/). Jobs may have been blocked doing nothing all night.

2023/12/06 08:31 · ltaulell

20231116 / Partitions flix

  • Did you know you can use flix partitions ?

Lake-flix and Cascade-flix are open to everybody, for short duration (no longer than 2 days is best, but standard walltime apply) small parallel and sequential jobs, with requeue in case of high priority jobs (see documentation)

example:

#SBATCH --partition=Lake,Lake-flix
# or
#SBATCH --partition=Cascade,Cascade-flix
  • /Xnfs/abc volume has been moved and is usable
2023/11/16 16:25 · ltaulell

20231114 / Xnfs/abc

The /Xnfs/abc volume will be moved to a new server Thursday 16th of November, in the morning.

Any nf (NextFlow) running at that time might need a restart if crashed.

2023/11/14 13:53 · ltaulell

20231030 / CRAL mounts (and homes)

A disk on data8 (main CRAL fileserver) was… not well. After a good hammer blow, all export were restarted.

homes and exports may have been unavailable for a few moments.

2023/10/30 11:14 · ltaulell
news/blog.txt · Dernière modification : 2020/08/25 15:58 de 127.0.0.1