mila.infiniband

InfiniBand

This role helps you install and set up InfiniBand interfaces.

Role Variables

Installation

Specify the APT or YUM/DNF repositories, including the path to the GPG key.

# Default APT repository
infiniband_apt_repository: 'deb https://linux.mellanox.com/public/repo/mlnx_ofed/latest/ubuntu18.04/amd64/ ./'

# Default YUM/DNF repository
infiniband_yum_repository: 'https://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.3/$basearch/'

# Mellanox GPG Key
infiniband_gpg_key: 'http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox'

To upgrade the packages to the latest version, set this to True:

infiniband_upgrade: True

To set the priority of the APT repository (only for APT):

infiniband_apt_priority: 490

The name of the kernel headers package may vary across different Linux distributions. Set it like this:

infiniband_kernel_headers_package: 'linux-headers'

Some systems, like the NVIDIA DGX, have their own software stack. If they already manage the Mellanox OFED drivers, you don’t need to configure extra repositories or install drivers. In these cases, set the following parameters to False in the host inventory. The same applies when using the kernel modules provided by the distribution.

infiniband_configure_repos: True
infiniband_install_kernel_modules: True

IPoIB

List the interfaces you want to configure with IPoIB. For each, define the interface name (iface), an offset from the default IPv4 address, and the CIDR prefix. If you don’t provide a list, no IPoIB interface will be configured.

infiniband_ipoib_interfaces:
  - iface: 'ib0'
    offset: -3064461568
    prefix: 17

To calculate the offset, run this command:

$ python3 <<EOF
import ipaddress
source_net='192.168.121.0'
target_net='10.0.128.0'
offset=(int(ipaddress.ip_address(source_net))-int(ipaddress.ip_address(target_net)))
print(f"offset: -{offset}")
EOF

Virtualization - SR-IOV

Set up the list of Mellanox HCAs and the parameters to apply. The pci_bus must match the value in /sys/bus/pci/devices/ (for example, /sys/bus/pci/devices/0000:41:00.0/).

infiniband_hca_devices:
  - device: mlx5_0
    pci_bus: '0000:41:00.0'
    sriov_en: True
    num_of_vfs: 8

Currently, only the SRIOV_EN and NUM_OF_VFS parameters are supported.

Define a prefix for the 64-bit IB GUID of VFs. The GUID will be made up of:

  • prefix (40 bits)
  • "00" (8 bits)
  • device ID (8 bits): the index of the item in infiniband_hca_devices
  • VF ID (8 bits): the index of the VF (from 0 to infiniband_hca_devices[*].num_of_vfs)

If you plan to set up more than one host with SR-IOV in your InfiniBand fabric, it is advisable to use a different prefix than the default.

For high availability configurations, ensure that the same infiniband_hca_devices and infiniband_guid_prefix are used for all hosts where a VM might run.

infiniband_guid_prefix: "4d:69:6c:61:00"

By default, the role will reboot the hosts to apply any new settings. To prevent the reboot, set this:

infiniband_allow_reboot: false

To minimize unexpected downtime in high availability clusters, hosts will reboot one at a time. You can increase the delay between reboots with this:

infiniband_throttle_reboot: "{{ ansible_play_hosts | length }}"

Example Playbook

To install and configure InfiniBand, use:

- hosts: computes:&infiniband
  roles:
    - role: mila.infiniband
      tags: 'role::infiniband'
Informazioni sul progetto

Install and configure InfiniBand interfaces

Installa
ansible-galaxy install mila.infiniband
Licenza
mit
Download
10.1k
Proprietario
Quebec Artificial Intelligence Institute