20200710 / Post-Mortem
The correct assumption is “Shitstorm hit the fan.” (I stand corrected)
We are not done yet:
ssh.psmn is under attack from a botnet, that's why “maximum authentication attempts exceeded”,
master LDAP server is down. We are running from slave1 (backup from yesterday),
All scratch are almost back (expect for some nodes on E5 and X5),
/homes and /Xnfs should be OK everywhere (“should”, as in “remount is ongoing”),
EDIT 13:00: master LDAP server is back online \o/ !
EDIT 13:50: Cluster X5 is fully up & running.
EDIT 14:05: Clusters E5 and Lake up & running.