S'abonner au fil des news (flux RSS)

Fil des news

20240215 / Maintenance on login-nodes

When you login to a node, a MOTD is displayed (Message Of The Day). Please pay attention to it and don't waste my time and your time.

cl6226comp1 is undergoing maintenance, DO NOT USE IT.

EDIT: Maintenance is done, cl6226comp1 can be used again.

2024/02/15 11:34 · ltaulell

20240214 / upgrade ongoing on iRods

We are upgrading the gateway server for iRods. Expect non-working connection for the day.

EIDT: upgrade done. all services working. For users of this service, please read the updated documentation in /data/psmn/, about the configuration file format (changed).

2024/02/14 08:45 · ltaulell

20240123 / scratches on Cascade

While the crew was half-brained by COVID, nobody thought to verify the little SubnetManager daemon that was OFF…

Cascade is back to NORMINAL state.

2024/01/23 11:18 · ltaulell

20240118 / scratches on Cascade

We have a problem on 2 servers for scratches on Cascade cluster : one from /scratch/Cral, one from /scratch/Cascade. They both have a dead infiniband network card. We are waiting for resupply to repair.

Symptoms: Files and/or directories are not available from both /scratch/Cral or /scratch/Cascade.

EDIT: We find out, both infiniband cables are dead.

2024/01/18 10:30 · ltaulell

20240110 / network down, HPC down

Something went wrong, don't now what yet. On it.

EDIT: Master switch rebooted unexpectidely, killing all network connections between nodes and SIDUS-master ('/' on nodes) → general reboot (in progress) of all nodes, comp & visu.

all running jobs are lost.

EDIT2: expect some delay before everything back to normal…

EDIT3: except a few nodes, back to norminal.

EDIT4: WATCH YOUR JOBS! a large bunch of jobs have been “REQUEUE” by slurm. It may result in “unexpected behaviors”.

2024/01/10 09:38 · ltaulell
news/blog.txt · Dernière modification : 2020/08/25 15:58 de 127.0.0.1