This is an old revision of the document!
What still needs to be fixed
Apptainer
Make that everyone exports (after adaptation of course) : ✔ done : Created a /data/apptainer/user.name/.apptainer, migrated the current .apptainer users dirs and added a symlink in their home.
export APPTAINER_CACHEDIR=/scratch/gpfs/$USER/APPTAINER_CACHE export APPTAINER_TMPDIR=/tmp
- To prevent quota explosion : Installed the quota package on disco and chacha :. should we enforce quota on FS level ? All the people who said they would use less than 100GB are using more than what they said (ex martin.barry at 350GB) : Applied quota on / filesystem, not yet on datasets.
Content
- Explain to PA how to create a proper structure.
- Where do we put the content of this file, as some information is not intended for the general public
- How can we make animations on the Wiki using JS + SVG and stuff ? Snow ?
Create a proper structure, starting with the infrastructures✔ done, after discussion merge into docs: for both groups- With a section for students
- With a section for teachers
Make a limited-access location with the critical information✔ done- Construct the tools for teacher sections with the existing informations
Monitoring
- alerting and messaging on various metrics (disk space, cpu usage, …) for the various computational resources
(chacha, disco,Done calypso & others)
Server room
- Make something nice there. Posters on the walls, screens, stuff.
- Why is there still a box for a server in the networking lab room ? : Because we need at least one for sending back in case of support / we needed one for network labs to hide the CTF network setup
- Rename networking lab room and change the remplaçant for Darko as well
- Why only 10 GB for the fiber
- I don't want a patch panel inside the server rack but outside of it. Space will be premium soon there and we don't know where to put the server rack :
Search and buy a patch panel to put in the roomDone. - Find a proper layout for the server room for accommodating a water-cooled rack and maybe another one in a couple of months
- If we have Rumba running there, we need some UPS solution.
- Do a drawing schematic of the future rack, notably for having a proper rumba failover policy
- Choose new R630 and R730 / R740 for RUMBA main. Budget 3 kFr
- Do we need a file server from the guys downstairs (baignoire)
Remove again the big oven : check with Hervé Girard to store it in 23N322 : this is where a student used it last time (but nicely returned it to N307), why not keep it there ? EDIT: the RoL of N322 is Thomas Sterren, I sent him a message for the oven. (Rémi) / EDIT2 : answer is “we share the room so it will stay in 307. period.” : need to find a place to store it ourselves.Moved to 23N321 : They want it back to N307, but for no valid reason : so it will stay in N321 where it is less annoying or if it leaves, it just leaves our rooms for good. EDIT 2025-04-03 : The oven will go in 23N111, the Rol is Cedric (Clivaz?), he should contact us to install the oven in its lab.
Slurm on chacha or disco
Make both GPUs available in gres/slurmd confs✔ doneMake emails working for start/end of jobs, use an emailer✔ doneFind how to do the ressource partitioning with billing credits by user / account✔ done (but still needs tests and real jobs to see how to tweak)- Discuss how to allocate credits for users : what about students ?
Note everywhere to either remove sshfs for VScode, and give links to properly configure it or no VScode at all :Noted on runjoband started script to check for .vscode in homedirs : auto-rm in crontab directly ?Done- For the future jump server, need to test how to restrict ssh access to other servers : via SLURM they might recreate their authorized_keys by running a job writing a .ssh/authorized_keys on the server the job is run. (change .ssh/ permission disabling them to chmod this dir?)
Calypso
- Reinstall slurm by compiling with all necessary plugins,then package using debuild : https://slurm.schedmd.com/quickstart_admin.html#debuild , then deploy the .deb by Ansible
Rumba
- Turn on Rumba and install a proper env for us, mainly based on docker as a limited number of members will use it
- Test backup and replicate ISC / Learn on Rumba
- Migrate the wiki there
- Migrate ISC / Learn there ? TBD
- Have VPS and cloud coder there, please.
Hannibal
- Backup DokuWiki : ✔ Done already, Hannibal has /srv/www completely backuped on the Synolog NAS DS923
- Add Ingegamez website on wordpress
Site
- Proper CSS for title, also for the alignment which is ugly (look at this page!)
- Editor with no tabs
- Why is there a search box with the same text ?
- Rights done properly for every ISC member
