marvel-nccr.slurm
Ansible Role: marvel-nccr.slurm
This Ansible role installs the Slurm workload manager on Ubuntu (tested on versions 16.04, 18.04, and 20.04).
Important: This role also sets up the machine to be a compute node, which means it can automatically detect the number of CPUs and other resources. Currently, it's not designed to create a cluster with multiple nodes. For that, look into tools like Elasticluster.
Features of the Role:
- Installs Slurm packages.
- Configures Slurm (
/etc/slurm-llnl/slurm.conf
) to use the available resources on the machine automatically (like hostname and number of CPUs), setting up one node (called$HOSTNAME
) and one partition (calledslurm_partition_name
). - Adds a script and service (
slurm-resources
) to manage platform resources efficiently (this is useful for creating VM images with varying resources). - Starts Slurm services.
Check Services Status
To see if the services are running, use:
$ systemctl --type=service
You should see something like:
slurmctld.service loaded active running Slurm controller daemon
slurmd.service loaded active running Slurm node daemon
Check Node and Partition
To check the Slurm node and partition:
$ scontrol show node
$ scontrol show partition
These should match the resources listed in the output of lscpu
.
Enable/Disable the Service
To enable or disable the slurm-resources
service:
$ systemctl enable slurm-resources
Update Resource Settings
To change the resource settings directly, you can run:
$ slurm-resources -e restart_on_change=true -e slurm_max_cpus=2
This will set the maximum CPUs for the partition to 2 and restart the services if any changes were made.
Installation
You can install this role with:
ansible-galaxy install marvel-nccr.slurm
Role Variables
Check the file defaults/main.yml
for variable information.
Example Playbook
Here’s a simple example of how to use this role in a playbook:
- hosts: servers
roles:
- role: marvel-nccr.slurm
Development and Testing
This role uses Molecule and Docker for testing.
After installing Docker, clone the repository into a folder named marvel-nccr.slurm
:
git clone https://github.com/marvel-nccr/ansible-role-slurm marvel-nccr.slurm
cd marvel-nccr.slurm
Then, run these commands:
pip install -r requirements.txt # This installs Molecule
molecule test # This runs the tests
You can also use Tox (refer to tox.ini
):
pip install tox
tox
Code Style
For code style, the format is set and checked with pre-commit.
To use it, run:
pip install pre-commit
pre-commit run --all
Deployment
Deployment to Ansible Galaxy is automated through GitHub Actions. Simply tag a release as vX.Y.Z
to start the CI and release process. Please note, the release will only be finalized if all CI tests pass.
License
MIT License
Contact
For questions about Quantum Mobile and its associated Ansible roles, please reach out to the AiiDA mailing list.
An Ansible role that installs the [slurm](https://slurm.schedmd.com/) workload manager on Ubuntu.
ansible-galaxy install marvel-nccr.slurm