mila.infiniband
InfiniBand
This role helps you install and set up InfiniBand interfaces.
Role Variables
Installation
Specify the APT or YUM/DNF repositories, including the path to the GPG key.
# Default APT repository
infiniband_apt_repository: 'deb https://linux.mellanox.com/public/repo/mlnx_ofed/latest/ubuntu18.04/amd64/ ./'
# Default YUM/DNF repository
infiniband_yum_repository: 'https://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.3/$basearch/'
# Mellanox GPG Key
infiniband_gpg_key: 'http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox'
To upgrade the packages to the latest version, set this to True:
infiniband_upgrade: True
To set the priority of the APT repository (only for APT):
infiniband_apt_priority: 490
The name of the kernel headers package may vary across different Linux distributions. Set it like this:
infiniband_kernel_headers_package: 'linux-headers'
Some systems, like the NVIDIA DGX, have their own software stack. If they already manage the Mellanox OFED drivers, you don’t need to configure extra repositories or install drivers. In these cases, set the following parameters to False in the host inventory. The same applies when using the kernel modules provided by the distribution.
infiniband_configure_repos: True
infiniband_install_kernel_modules: True
IPoIB
List the interfaces you want to configure with IPoIB. For each, define the interface name (iface), an offset from the default IPv4 address, and the CIDR prefix. If you don’t provide a list, no IPoIB interface will be configured.
infiniband_ipoib_interfaces:
- iface: 'ib0'
offset: -3064461568
prefix: 17
To calculate the offset, run this command:
$ python3 <<EOF
import ipaddress
source_net='192.168.121.0'
target_net='10.0.128.0'
offset=(int(ipaddress.ip_address(source_net))-int(ipaddress.ip_address(target_net)))
print(f"offset: -{offset}")
EOF
Virtualization - SR-IOV
Set up the list of Mellanox HCAs and the parameters to apply. The pci_bus must match the value in /sys/bus/pci/devices/ (for example, /sys/bus/pci/devices/0000:41:00.0/).
infiniband_hca_devices:
- device: mlx5_0
pci_bus: '0000:41:00.0'
sriov_en: True
num_of_vfs: 8
Currently, only the SRIOV_EN
and NUM_OF_VFS
parameters are supported.
Define a prefix for the 64-bit IB GUID of VFs. The GUID will be made up of:
- prefix (40 bits)
- "00" (8 bits)
- device ID (8 bits): the index of the item in infiniband_hca_devices
- VF ID (8 bits): the index of the VF (from 0 to infiniband_hca_devices[*].num_of_vfs)
If you plan to set up more than one host with SR-IOV in your InfiniBand fabric, it is advisable to use a different prefix than the default.
For high availability configurations, ensure that the same infiniband_hca_devices and infiniband_guid_prefix are used for all hosts where a VM might run.
infiniband_guid_prefix: "4d:69:6c:61:00"
By default, the role will reboot the hosts to apply any new settings. To prevent the reboot, set this:
infiniband_allow_reboot: false
To minimize unexpected downtime in high availability clusters, hosts will reboot one at a time. You can increase the delay between reboots with this:
infiniband_throttle_reboot: "{{ ansible_play_hosts | length }}"
Example Playbook
To install and configure InfiniBand, use:
- hosts: computes:&infiniband
roles:
- role: mila.infiniband
tags: 'role::infiniband'
ansible-galaxy install mila.infiniband