grycap.nvidia_driver
ansible-role-nvidia-driver
An Ansible role to install the NVIDIA driver from the CUDA repositories provided by NVIDIA.
Requirements
When installing the NVIDIA driver, this role will reboot the nodes where it runs. Therefore, it’s highly recommended to run ansible-playbook
from a different machine than the GPU nodes where the driver is being installed.
If you try to run Ansible on the same machine as the one where the driver is being installed, this role will either:
- Stop with an error like
Running reboot with local connection would reboot the control node
(if using thelocal
connection) - Reboot the machine you're on, which will stop the playbook execution! (if using an
ssh
connection to localhost)
Installing
You can install this role using Ansible Galaxy:
$ ansible-galaxy install grycap.nvidia_driver
Role variables
Variable | Default value | Description |
---|---|---|
nvidia_driver_package_version |
"" |
The version of the package to install. Make sure this matches the actual version of the deb or RPM package. |
nvidia_driver_persistence_mode_on |
yes |
Whether to enable persistence mode (true/false) |
nvidia_driver_skip_reboot |
no |
Whether to skip rebooting the machine during installation |
nvidia_driver_module_file |
"/etc/modprobe.d/nvidia.conf" |
The filename used for NVIDIA driver settings |
nvidia_driver_module_params |
"" |
Parameters to pass to the NVIDIA driver |
Variables for Red Hat
Variable | Default value | Description |
---|---|---|
epel_package |
"https://dl.fedoraproject.org/pub/epel/epel-release-latest-{{ ansible_distribution_major_version }}.noarch.rpm" |
Package to install for EPEL support |
nvidia_driver_rhel_cuda_repo_baseurl |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _rhel_repo_dir }}/" |
Base URL for CUDA repository |
nvidia_driver_rhel_cuda_repo_gpgkey |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _rhel_repo_dir }}/7fa2af80.pub" |
GPG key for the CUDA repository |
Variables for Ubuntu
For Ubuntu installs, you can choose to install from either the Canonical repositories or the NVIDIA CUDA repositories.
By default, the Canonical repositories will be used, and the driver installed will be the headless server driver.
Variable | Default value | Description |
---|---|---|
nvidia_driver_ubuntu_install_from_cuda_repo |
no |
Flag to indicate whether to use the CUDA repository |
nvidia_driver_ubuntu_branch |
450 |
Driver branch to use for installation |
nvidia_driver_ubuntu_packages |
["nvidia-headless-450-server", "nvidia-headless-450-utils"] |
Package names to install from Canonical repo |
nvidia_driver_ubuntu_cuda_repo_baseurl |
"http://developer.download.nvidia.com/compute/cuda/repos/{{ _ubuntu_repo_dir }}" |
Base URL for CUDA repository |
nvidia_driver_ubuntu_cuda_repo_gpgkey_url |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _ubuntu_repo_dir }}/7fa2af80.pub" |
GPG key for the CUDA repository |
nvidia_driver_ubuntu_cuda_repo_gpgkey_id |
"7fa2af80" |
GPG key ID for the CUDA repository |
nvidia_driver_ubuntu_cuda_package |
"cuda-drivers" |
Package name to install from the CUDA repository |
Example playbook
- hosts: gpu_nodes
roles:
- nvidia.nvidia_driver
Supported distributions
This role currently supports the following Linux distributions:
- NVIDIA DGX OS 4
- NVIDIA DGX OS 5
- Ubuntu 18.04 LTS
- Ubuntu 20.04 LTS
- CentOS 7
- CentOS 8
- Red Hat Enterprise Linux 7