marvel-nccr.slurm

Ansible Role: marvel-nccr.slurm

This Ansible role installs the Slurm workload manager on Ubuntu (tested on versions 16.04, 18.04, and 20.04).

Important: This role also sets up the machine to be a compute node, which means it can automatically detect the number of CPUs and other resources. Currently, it's not designed to create a cluster with multiple nodes. For that, look into tools like Elasticluster.

Features of the Role:

  • Installs Slurm packages.
  • Configures Slurm (/etc/slurm-llnl/slurm.conf) to use the available resources on the machine automatically (like hostname and number of CPUs), setting up one node (called $HOSTNAME) and one partition (called slurm_partition_name).
  • Adds a script and service (slurm-resources) to manage platform resources efficiently (this is useful for creating VM images with varying resources).
  • Starts Slurm services.

Check Services Status

To see if the services are running, use:

$ systemctl --type=service

You should see something like:

slurmctld.service                  loaded active running Slurm controller daemon
slurmd.service                     loaded active running Slurm node daemon

Check Node and Partition

To check the Slurm node and partition:

$ scontrol show node
$ scontrol show partition

These should match the resources listed in the output of lscpu.

Enable/Disable the Service

To enable or disable the slurm-resources service:

$ systemctl enable slurm-resources

Update Resource Settings

To change the resource settings directly, you can run:

$ slurm-resources -e restart_on_change=true -e slurm_max_cpus=2

This will set the maximum CPUs for the partition to 2 and restart the services if any changes were made.

Installation

You can install this role with:

ansible-galaxy install marvel-nccr.slurm

Role Variables

Check the file defaults/main.yml for variable information.

Example Playbook

Here’s a simple example of how to use this role in a playbook:

- hosts: servers
  roles:
  - role: marvel-nccr.slurm

Development and Testing

This role uses Molecule and Docker for testing.

After installing Docker, clone the repository into a folder named marvel-nccr.slurm:

git clone https://github.com/marvel-nccr/ansible-role-slurm marvel-nccr.slurm
cd marvel-nccr.slurm

Then, run these commands:

pip install -r requirements.txt  # This installs Molecule
molecule test  # This runs the tests

You can also use Tox (refer to tox.ini):

pip install tox
tox

Code Style

For code style, the format is set and checked with pre-commit.

To use it, run:

pip install pre-commit
pre-commit run --all

Deployment

Deployment to Ansible Galaxy is automated through GitHub Actions. Simply tag a release as vX.Y.Z to start the CI and release process. Please note, the release will only be finalized if all CI tests pass.

License

MIT License

Contact

For questions about Quantum Mobile and its associated Ansible roles, please reach out to the AiiDA mailing list.

Informazioni sul progetto

An Ansible role that installs the [slurm](https://slurm.schedmd.com/) workload manager on Ubuntu.

Installa
ansible-galaxy install marvel-nccr.slurm
Licenza
other
Download
2k
Proprietario
The NCCR MARVEL is a centre on Computational Design and Discovery of Novel Materials created by the Swiss National Science Foundation