galaxyproject.slurm
Slurm
Overview
This guide explains how to install and set up a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers.
Role Variables
All variables are optional. If you don't set any variables, the role will install the Slurm client, the munge authentication service, and create a basic slurm.conf file with a localhost node and a debug partition. Check the default settings and example playbooks for more details.
Each Slurm node can have different roles. You can either set group names or add roles to the slurm_roles list:
- For controller nodes, use group
slurmserversor setslurm_roles: ['controller'] - For execution nodes, use group
slurmexechostsor setslurm_roles: ['exec'] - For database nodes, use group
slurmdbdserversor setslurm_roles: ['dbd']
General configuration options for slurm.conf go in slurm_config. Within this, you specify Slurm configuration options using their names as keys.
You can define partitions and nodes with slurm_partitions and slurm_nodes, which are lists of settings. The only required field is name, which sets the PartitionName or NodeName. Other settings can be included as well.
For additional configurations, you can specify settings for the files acct_gather.conf, cgroup.conf, and gres.conf in slurm_acct_gather_config, slurm_cgroup_config (both hashes), and slurm_gres_config (a list of hashes).
Set slurm_upgrade to true to upgrade Slurm packages.
Use slurm_user (a hash) and slurm_create_user (a boolean) to create a Slurm user to match user IDs.
Since this role requires root access, ensure you enable become globally in your playbook or for this role specifically as shown in the examples below.
Dependencies
None.
Example Playbooks
Here is a basic setup with all services on a single node:
- name: Slurm all in One
hosts: all
vars:
slurm_roles: ['controller', 'exec', 'dbd']
roles:
- role: galaxyproject.slurm
become: True
A more detailed example:
- name: Slurm execution hosts
hosts: all
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_cgroup_config:
CgroupMountpoint: "/sys/fs/cgroup"
CgroupAutomount: yes
ConstrainCores: yes
TaskAffinity: no
ConstrainRAMSpace: yes
ConstrainSwapSpace: no
ConstrainDevices: no
AllowedRamSpace: 100
AllowedSwapSpace: 0
MaxRAMPercent: 100
MaxSwapPercent: 100
MinRAMSpace: 30
slurm_config:
AccountingStorageType: "accounting_storage/none"
ClusterName: cluster
GresTypes: gpu
JobAcctGatherType: "jobacct_gather/none"
MpiDefault: none
ProctrackType: "proctrack/cgroup"
ReturnToService: 1
SchedulerType: "sched/backfill"
SelectType: "select/cons_res"
SelectTypeParameters: "CR_Core"
SlurmctldHost: "slurmctl"
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmctldPidFile: "/var/run/slurmctld.pid"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdPidFile: "/var/run/slurmd.pid"
SlurmdSpoolDir: "/var/spool/slurmd"
StateSaveLocation: "/var/spool/slurmctld"
SwitchType: "switch/none"
TaskPlugin: "task/affinity,task/cgroup"
TaskPluginParam: Sched
slurm_create_user: yes
slurm_gres_config:
- File: /dev/nvidia[0-3]
Name: gpu
NodeName: gpu[01-10]
Type: tesla
slurm_munge_key: "../../../munge.key"
slurm_nodes:
- name: "gpu[01-10]"
CoresPerSocket: 18
Gres: "gpu:tesla:4"
Sockets: 2
ThreadsPerCore: 2
slurm_partitions:
- name: gpu
Default: YES
MaxTime: UNLIMITED
Nodes: "gpu[01-10]"
slurm_roles: ['exec']
slurm_user:
comment: "Slurm Workload Manager"
gid: 888
group: slurm
home: "/var/lib/slurm"
name: slurm
shell: "/usr/sbin/nologin"
uid: 888
License
MIT
Author Information
ansible-galaxy install galaxyproject.slurm