This is an old revision of the document!
SLURM configuration
This section details the current SLURM configuration for the ISC Computational Center.
Installation
SLURM has been installed from tarball, version : 24.11.0
All official plugins installed :
- libnvidia-ml
- TODO
Install / upgrade process TODO
Architecture
Chacha
- Client (slurm-smd-client)
- Worker (slurm-smd,slurm-smd-slurmd)
- Controller (slurm-smd-slurmctld)
- Accounting DB (slurm-smd-slurmdbd)
Disco
- Client (slurm-smd-client)
- Worker (slurm-smd,slurm-smd-slurmd)
Schema
TODO
Partitions
Dance
This is the default partition : is is currently composed of Chacha and Disco.
Chacha and Disco
These partitions can be used to restrain an account to use only one server.
Accounts
Accounts have been created in 2 groups :
- Premium Researchers (premium_rs) :
- Standard Researchers (standard_rs) : All users who can't participate financially to the project. Students are also part of this group
There are 2 other groups : Test and temp : Test is only for administration purpose, and temp is a locked group either to migrate someone from another account (can't delete an account when someone has it as a default account) or to disallow someone to run jobs (MaxSubmitJob=0)
QOS and Limits
Current limits on QOS and accounts applied to each project account :
- premium_rs :
- MaxCPUs=44
- MaxNodes=2
- MaxTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
- GrpTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
- GrpWall=08:00:00
- MaxWall=08:00:00
- standard_rs :
- MaxCPUs=24
- MaxNodes=2
- MaxTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
- GrpTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
- GrpWall=04:00:00
- MaxWall=04:00:00
Scheduling
Fairshare is one way of priorizing jobs in the job queue.
- premium_rs :
- Fairshare : 750
- standard_rs :
- Fairshare : 250
TODO
User creation
Users fill their form from the page ISC Computational Center , then their information is used to create their SSH access, and their SLURM user in its project account, according to the SLA / QOS we can provide : Premium or Standard.
Several users can share the same project account to work as a team. (Limits are applied both to each user for maximum limits and the group for group limits)
Currently, Rémi creates and configure user accounts.
