Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
infra:compute:slurmconfig [2025/02/18 18:37] remiinfra:compute:slurmconfig [2025/11/10 12:48] (current) – [User creation] ask the staff, not Rémi marc
Line 43: Line 43:
 These partitions can be used to restrain an account to use only one server. These partitions can be used to restrain an account to use only one server.
  
 +More details of partitions usage TODO
  
 ===== Accounts ===== ===== Accounts =====
Line 54: Line 54:
 There are 2 other groups : Test and temp : Test is only for administration purpose, and temp is a locked group either to migrate someone from another account (can't delete an account when someone has it as a default account) or to disallow someone to run jobs (MaxSubmitJob=0) There are 2 other groups : Test and temp : Test is only for administration purpose, and temp is a locked group either to migrate someone from another account (can't delete an account when someone has it as a default account) or to disallow someone to run jobs (MaxSubmitJob=0)
  
 +TODO
  
  
Line 66: Line 66:
     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
-    * GrpWall=08:00:00 +    * GrpWall=3-00:00:00 
-    * MaxWall=08:00:00+    * MaxWall=3-00:00:00
  
   * **standard_rs :**   * **standard_rs :**
     * MaxCPUs=24     * MaxCPUs=24
-    * MaxNodes=2+    * MaxNodes=1
     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
-    * GrpWall=04:00:00 +    * GrpWall=1-00:00:00 
-    * MaxWall=04:00:00+    * MaxWall=1-00:00:00
  
 +**TODO :** complete / modify until limits are finished
  
 ===== Scheduling ===== ===== Scheduling =====
Line 86: Line 87:
     * Fairshare : 250     * Fairshare : 250
  
 +Then on each account, everyone has a 100 fairshare : SLURM is just making calculation then on the percentage using both the share in each group, and the parent account premium / standard.
  
  
Line 95: Line 96:
 Several users can share the same project account to work as a team. (Limits are applied both to each user for maximum limits and the group for group limits) Several users can share the same project account to work as a team. (Limits are applied both to each user for maximum limits and the group for group limits)
  
-Currently, Rémi creates and configure user accounts.+The [[infra:staff|staff]] creates and configure user accounts
 + 
 +[[infra:compute:tooling:adduser|User creation process]] 
 + 
 + 
 + 
 +===== Backups ===== 
 + 
 +==== Cluster configuration ==== 
 + 
 +Currently, all the SLURM configuration is manually backuped (files, DB) 
 + 
 +TODO : automate and redirect to backup server when its ready 
 + 
 +==== User data ==== 
 + 
 +For user data, currently compute users have to request some space on the filer01.hevs.ch server from the Sinf : there is a [[ https://hessoit.sharepoint.com/sites/VS-Intranet-SInf | request form in "Demande de service" ]] in ** "Comptes et accès > Obtention d'accès réseau"** . More documentation to see with the [[ https://servicedesk.hevs.ch/hesso_portal?sys_kb_id=4811f7f18759da1077bbc9140cbb35a7&id=kb_article_view&sysparm_rank=1&sysparm_tsqueryId=b40006138751ae10b87e43740cbb355e | Sinf here ]], generally for researchers the filesystem to ask for is the **fs_projets**.
  
  
 +Consider all space on the Compute center to be short-lived, it is not made to store data as backups : only temporary to compute and get results.
Back to top