====== 20240110 / network down, HPC down ====== Something went wrong, don't now what yet. On it. **EDIT:** Master switch rebooted unexpectidely, killing all network connections between nodes and SIDUS-master ('/' on nodes) -> **general reboot** (in progress) of all nodes, comp & visu. **all running jobs are lost.** **EDIT2:** expect some delay before everything back to normal... **EDIT3:** except a few nodes, back to norminal. **EDIT4:** **WATCH YOUR JOBS!** a large bunch of jobs have been "REQUEUE" by slurm. It may result in "unexpected behaviors". {{tag> monday }}