Differences

This shows you the differences between two versions of the page.

--- infra:compute:slurmconfig [2025/02/18 18:37] – remi
+++ infra:compute:slurmconfig [2025/11/10 12:48] (current) – [User creation] ask the staff, not Rémi marc
@@ Line 43: / Line 43: @@
 These partitions can be used to restrain an account to use only one server.
+More details of partitions usage TODO
 ===== Accounts =====
@@ Line 54: / Line 54: @@
 There are 2 other groups : Test and temp : Test is only for administration purpose, and temp is a locked group either to migrate someone from another account (can't delete an account when someone has it as a default account) or to disallow someone to run jobs (MaxSubmitJob=0)
+TODO
@@ Line 66: / Line 66: @@
     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=44,mem=500G
-    * GrpWall=08:00:00
+    * GrpWall=3-00:00:00
-    * MaxWall=08:00:00
+    * MaxWall=3-00:00:00
   * **standard_rs :**
     * MaxCPUs=24
-    * MaxNodes=2
+    * MaxNodes=1
     * MaxTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
     * GrpTRES=gres/gpu=1,gres/shard=96,cpu=24,mem=256G
-    * GrpWall=04:00:00
+    * GrpWall=1-00:00:00
-    * MaxWall=04:00:00
+    * MaxWall=1-00:00:00
+**TODO :** complete / modify until limits are finished
 ===== Scheduling =====
@@ Line 86: / Line 87: @@
     * Fairshare : 250
+Then on each account, everyone has a 100 fairshare : SLURM is just making calculation then on the percentage using both the share in each group, and the parent account premium / standard.
@@ Line 95: / Line 96: @@
 Several users can share the same project account to work as a team. (Limits are applied both to each user for maximum limits and the group for group limits)
-Currently, Rémi creates and configure user accounts.
+The [[infra:staff|staff]] creates and configure user accounts.
+[[infra:compute:tooling:adduser|User creation process]]
+===== Backups =====
+==== Cluster configuration ====
+Currently, all the SLURM configuration is manually backuped (files, DB)
+TODO : automate and redirect to backup server when its ready
+==== User data ====
+For user data, currently compute users have to request some space on the filer01.hevs.ch server from the Sinf : there is a [[ https://hessoit.sharepoint.com/sites/VS-Intranet-SInf | request form in "Demande de service" ]] in ** "Comptes et accès > Obtention d'accès réseau"** . More documentation to see with the [[ https://servicedesk.hevs.ch/hesso_portal?sys_kb_id=4811f7f18759da1077bbc9140cbb35a7&id=kb_article_view&sysparm_rank=1&sysparm_tsqueryId=b40006138751ae10b87e43740cbb355e | Sinf here ]], generally for researchers the filesystem to ask for is the **fs_projets**.
+Consider all space on the Compute center to be short-lived, it is not made to store data as backups : only temporary to compute and get results.