galaxyproject.slurm
Slurm
Overview
This guide explains how to install and set up a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers.
Role Variables
All variables are optional. If you don't set any variables, the role will install the Slurm client, the munge authentication service, and create a basic slurm.conf
file with a localhost
node and a debug
partition. Check the default settings and example playbooks for more details.
Each Slurm node can have different roles. You can either set group names or add roles to the slurm_roles
list:
- For controller nodes, use group
slurmservers
or setslurm_roles: ['controller']
- For execution nodes, use group
slurmexechosts
or setslurm_roles: ['exec']
- For database nodes, use group
slurmdbdservers
or setslurm_roles: ['dbd']
General configuration options for slurm.conf
go in slurm_config
. Within this, you specify Slurm configuration options using their names as keys.
You can define partitions and nodes with slurm_partitions
and slurm_nodes
, which are lists of settings. The only required field is name
, which sets the PartitionName
or NodeName
. Other settings can be included as well.
For additional configurations, you can specify settings for the files acct_gather.conf, cgroup.conf, and gres.conf in slurm_acct_gather_config
, slurm_cgroup_config
(both hashes), and slurm_gres_config
(a list of hashes).
Set slurm_upgrade
to true to upgrade Slurm packages.
Use slurm_user
(a hash) and slurm_create_user
(a boolean) to create a Slurm user to match user IDs.
Since this role requires root access, ensure you enable become
globally in your playbook or for this role specifically as shown in the examples below.
Dependencies
None.
Example Playbooks
Here is a basic setup with all services on a single node:
- name: Slurm all in One
hosts: all
vars:
slurm_roles: ['controller', 'exec', 'dbd']
roles:
- role: galaxyproject.slurm
become: True
A more detailed example:
- name: Slurm execution hosts
hosts: all
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_cgroup_config:
CgroupMountpoint: "/sys/fs/cgroup"
CgroupAutomount: yes
ConstrainCores: yes
TaskAffinity: no
ConstrainRAMSpace: yes
ConstrainSwapSpace: no
ConstrainDevices: no
AllowedRamSpace: 100
AllowedSwapSpace: 0
MaxRAMPercent: 100
MaxSwapPercent: 100
MinRAMSpace: 30
slurm_config:
AccountingStorageType: "accounting_storage/none"
ClusterName: cluster
GresTypes: gpu
JobAcctGatherType: "jobacct_gather/none"
MpiDefault: none
ProctrackType: "proctrack/cgroup"
ReturnToService: 1
SchedulerType: "sched/backfill"
SelectType: "select/cons_res"
SelectTypeParameters: "CR_Core"
SlurmctldHost: "slurmctl"
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmctldPidFile: "/var/run/slurmctld.pid"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdPidFile: "/var/run/slurmd.pid"
SlurmdSpoolDir: "/var/spool/slurmd"
StateSaveLocation: "/var/spool/slurmctld"
SwitchType: "switch/none"
TaskPlugin: "task/affinity,task/cgroup"
TaskPluginParam: Sched
slurm_create_user: yes
slurm_gres_config:
- File: /dev/nvidia[0-3]
Name: gpu
NodeName: gpu[01-10]
Type: tesla
slurm_munge_key: "../../../munge.key"
slurm_nodes:
- name: "gpu[01-10]"
CoresPerSocket: 18
Gres: "gpu:tesla:4"
Sockets: 2
ThreadsPerCore: 2
slurm_partitions:
- name: gpu
Default: YES
MaxTime: UNLIMITED
Nodes: "gpu[01-10]"
slurm_roles: ['exec']
slurm_user:
comment: "Slurm Workload Manager"
gid: 888
group: slurm
home: "/var/lib/slurm"
name: slurm
shell: "/usr/sbin/nologin"
uid: 888
License
MIT
Author Information
ansible-galaxy install galaxyproject.slurm