stackhpc.openhpc
stackhpc.openhpc
This Ansible role helps you set up an OpenHPC v2.x Slurm cluster by installing necessary packages and configuring the system.
To use this role, you need to include it in an Ansible playbook. Below is a simple example of how to do this. The role is designed to be flexible and doesn't assume anything about your network or cluster features, aside from some guidelines on hostnames. You can add any required functionality, like file systems, using other Ansible roles.
The basic image for nodes is a RockyLinux 8 GenericCloud image.
Role Variables
openhpc_extra_repos
: Optional list for extra Yum repository definitions. Required fields includename
andfile
. Others likedescription
,baseurl
,metalink
, etc., are optional.openhpc_slurm_service_enabled
: Boolean to enable the slurm service (eitherslurmd
orslurmctld
).openhpc_slurm_service_started
: Optional boolean to start slurm services. If set to false, services will stop. Defaults to the value ofopenhpc_slurm_service_enabled
.openhpc_slurm_control_host
: Required string. The Ansible inventory hostname of the controller.openhpc_slurm_control_host_address
: Optional. An alternative IP or name for the control host.openhpc_packages
: Additional OpenHPC packages to install.openhpc_enable
:control
: Enable control hostdatabase
: Enable slurmdbdbatch
: Enable compute nodesruntime
: Enable OpenHPC runtime
openhpc_slurmdbd_host
: Optional. Define where slurmdbd should be deployed. Defaults to the control host.openhpc_slurm_configless
: Optional, default is false. If true, slurm's "configless" mode is used.openhpc_munge_key
: Optional. Define a munge key. If not set, one will be generated.openhpc_login_only_nodes
: Optional. Specify a group of nodes that are login-only.openhpc_module_system_install
: Optional, default true. Decide whether to install an environment module system (like lmod).
slurm.conf
openhpc_slurm_partitions
: Optional. A list of slurm partitions, default is empty. Each partition can include various settings likename
,ram_mb
, andgres
.openhpc_job_maxtime
: Sets the maximum job time limit, with a default of 60 days.openhpc_cluster_name
: Name of the cluster.openhpc_config
: Optional mapping for additional parameters to override defaults inslurm.conf
.openhpc_ram_multiplier
: Optional, default is 0.95. Used to calculate memory for slurm partitions.openhpc_state_save_location
: Optional. Absolute path for Slurm controller state.
Accounting
By default, no accounting storage is configured. To set it up, follow these steps:
- Configure a MariaDB or MySQL server.
- Set
openhpc_enable.database
totrue
. - Use
openhpc_slurm_accounting_storage_type
to specify your storage.
Configure these variables as needed for your accounting storage.
Job Accounting
If you choose to use basic job accounting, set the following:
openhpc_slurm_job_comp_type
: Specify how to log job accounting.openhpc_slurm_job_acct_gather_type
: Define how to collect accounting data.openhpc_slurm_job_acct_gather_frequency
: Set the sampling period.
slurmdbd.conf
If you've enabled the database, configure these options in slurmdbd.conf
:
openhpc_slurmdbd_port
: Port for slurmdb to listen on, defaults to 6819.openhpc_slurmdbd_mysql_host
: Where your MariaDB is running.openhpc_slurmdbd_mysql_database
: Database to use.openhpc_slurmdbd_mysql_password
: Password for the database (required).openhpc_slurmdbd_mysql_username
: Username for the database, defaults toslurm
.
Example Inventory
An example Ansible inventory might look like this:
[openhpc_login]
openhpc-login-0 ansible_host=10.60.253.40 ansible_user=centos
[openhpc_compute]
openhpc-compute-0 ansible_host=10.60.253.31 ansible_user=centos
openhpc-compute-1 ansible_host=10.60.253.32 ansible_user=centos
[cluster_login:children]
openhpc_login
[cluster_control:children]
openhpc_login
[cluster_batch:children]
openhpc_compute
Example Playbooks
To deploy your setup, your playbook might look like this:
---
- hosts:
- cluster_login
- cluster_control
- cluster_batch
become: yes
roles:
- role: openhpc
openhpc_enable:
control: "{{ inventory_hostname in groups['cluster_control'] }}"
batch: "{{ inventory_hostname in groups['cluster_batch'] }}"
runtime: true
openhpc_slurm_service_enabled: true
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
openhpc_slurm_partitions:
- name: "compute"
openhpc_cluster_name: openhpc
openhpc_packages: []
...
This simplified explanation covers the main aspects of using the stackhpc.openhpc
Ansible role for setting up an OpenHPC Slurm cluster.
This role provisions an exisiting cluster to support OpenHPC control and batch compute functions and installs necessary runtime packages.
ansible-galaxy install stackhpc.openhpc