Check spares for EPYC servers / order some discs, fans, power supplies
Finish the Mellanox switch IP configuration, to put in the new Sinf subnet 10.5.1.148/24 / GW 10.5.1.1 /
DNS 10.130.0.11,10.130.1.11
Test and configure BeeGFS on the EPYC 48TB storage
Configure the storage infiniband network
Create SLURM Test Partition : using shard on Disco, or put the current Rumba Dell 7920 with all 3 Nvidia RTX GPUS, and put the test partition on it
Change the creation script to make the “Test” partition the default partition when a researcher arrives on the ISC Compute, then when they are ready to run assign the “Dance” partition
Apply Data quota on all ISC compute users, not the case for everyone yet
Check the BeeGFS quota mecanism to migrate EXT4 quota from current Disco/Chacha to the new Epyc storage
Script a wrapper on Apptainer to check execution context and refuse to run directly bare-bone : same for python or other execcutable to avoid run out of SLURM
Migrate current NVMe data disks from Disco/Chacha to BeeGFS when it will be tested and ready also on EPYC
Automate file deletion for Standard (Premium too?) researchers to avoid having scratch partition with old tests files / Set in meeting what TTL we want : 2 weeks standard TTL ? More for Premium ?
Rename the Filesystem : datasets → workspace? local_workspace? chacha_workspace ? local_scratch ? / shared → network_workspace ? remote_workspace ? remote_scratch ?
Migrate Prometheus from Chacha to the new EPYC server / Add some alerting on common checks, disks, jobs outside of slurm etc…
Move NVMe disks ? Disco 7TB to Chacha ? / NVMe 3TB from Calypso storage to Disco ? Format as BeeGFS
Check for a Modules installation ? Or Apptainer is already fine ? : Install LMOD to allow dynamic lib loading : where to put the terabytes of libraries for Dance ?