PolarSPARC |
How-to Enable NVidia GPU for Docker
Bhaskar S | *UPDATED*12/23/2024 |
Overview
With all the buzz and spotlight around AI/ML these days, it is inevitable for developers in an Enterprise to start integrating their business application(s) with the future AI/ML products. Majority of the AI/ML products depend on GPU enabled platforms to run efficiently, which is currently dominated by NVidia.
Most of the Enterprise business application(s) run in Docker containers these days. Hence it goes without saying, that for the AI/ML enabled business application(s) to run efficiently in the container environment, one needs to enable the GPU access to the Docker container.
Enter the NVidia Container Toolkit - which enables the Enterprise developers to build and run GPU enabled Docker containers.
The following diagram illustrates the high-level architecture of the Docker and NVidia integration:
The NVidia Container Toolkit includes a runtime driver, which enables Docker containers to access the underlying NVidia GPUs. The toolkit under-the-hood leverages the Compute Unified Device Architecture (or CUDA ) software framework to access the parallel computing power of the NVidia GPUs for faster data processing.
Installation and Setup
The installation and setup will be performed on a Linux desktop with a decent NVidia graphics card installed and running the Ubuntu 24.04 LTS operating system.
Open a Terminal window to perform the various steps.
To perform a system update and install the prerequisite software, execute the following command:
$ sudo apt update && sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
The following would be a typical trimmed output:
...[ SNIP ]... ca-certificates is already the newest version (20240203). The following additional packages will be installed: python3-software-properties software-properties-gtk The following NEW packages will be installed: apt-transport-https curl The following packages will be upgraded: python3-software-properties software-properties-common software-properties-gtk 3 upgraded, 2 newly installed, 0 to remove and 14 not upgraded. ...[ SNIP ]...
To add the Docker package repository, execute the following commands:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.asc
$ echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable" | sudo tee /etc/apt/sources.list.d/docker.list
The following would be a typical output:
deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable
To install docker, execute the following command:
$ sudo apt update && sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
The following would be a typical trimmed output:
...[ SNIP ]... Get:5 https://download.docker.com/linux/ubuntu noble InRelease [48.9 kB] Get:6 https://download.docker.com/linux/ubuntu noble/stable amd64 Packages [13.6 kB] ...[ SNIP ]...
To add the logged in user alice to the group docker, execute the following command:
$ sudo usermod -aG docker ${USER}
REBOOT the system for the changes to take effect.
To verify docker installation was ok, execute the following command:
$ docker version
The following would be a typical output:
Client: Version: 26.1.3 API version: 1.45 Go version: go1.22.2 Git commit: 26.1.3-0ubuntu1~24.04.1 Built: Mon Oct 14 14:29:26 2024 OS/Arch: linux/amd64 Context: default Server: Engine: Version: 26.1.3 API version: 1.45 (minimum version 1.24) Go version: go1.22.2 Git commit: 26.1.3-0ubuntu1~24.04.1 Built: Mon Oct 14 14:29:26 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.12 GitCommit: runc: Version: 1.1.12-0ubuntu3.1 GitCommit: docker-init: Version: 0.19.0 GitCommit:
To verify the appropriate NVidia drivers have been installed in the Linux desktop, execute the following command:
$ nvidia-smi
The following would be a typical output:
Sun Dec 22 12:05:22 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:04:00.0 On | N/A | | 0% 38C P8 11W / 165W | 441MiB / 16380MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1356 G /usr/lib/xorg/Xorg 285MiB | | 0 N/A N/A 18902 G /usr/lib/firefox/firefox 146MiB | +-----------------------------------------------------------------------------------------+
To test the access of the NVidia GPU in docker, we will need some kind of a docker image. In order to perform the test, we will use the docker image nvidia/cuda:12.5.1-base-ubuntu24.04, which was the latest at the time of this article.
To download above mentioned docker image, execute the following command:
$ docker pull nvidia/cuda:12.5.1-base-ubuntu24.04
The following would be a typical output:
12.5.1-base-ubuntu24.04: Pulling from nvidia/cuda 9c704ecd0c69: Pull complete be90e53f8898: Pull complete f86719cadbb3: Pull complete f67742d00263: Pull complete 85bb4fbc01b0: Pull complete Digest: sha256:3c873cb78e31c983287909538738aca572887a919dae789bd04219e1d702c294 Status: Downloaded newer image for nvidia/cuda:12.5.1-base-ubuntu24.04 docker.io/nvidia/cuda:12.5.1-base-ubuntu24.04
To test the access of the NVidia GPU from docker, execute the following command:
$ docker run --rm --gpus all nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smi
The following would be a typical output:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
From the Output.7 above, it is evident that docker has no access to the underlying NVidia GPU in the system.
To add the NVidia toolkit repository, execute the following commands:
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
$ echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
The following would be a typical output:
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /
To perform a system update and to install the NVidia-Docker runtime integration, execute the following command:
$ sudo apt update && sudo apt install -y nvidia-docker2
The following would be a typical trimmed output:
...[ SNIP ]... Preparing to unpack .../libnvidia-container1_1.17.3-1_amd64.deb ... Unpacking libnvidia-container1:amd64 (1.17.3-1) ... Selecting previously unselected package libnvidia-container-tools. Preparing to unpack .../libnvidia-container-tools_1.17.3-1_amd64.deb ... Unpacking libnvidia-container-tools (1.17.3-1) ... Selecting previously unselected package nvidia-container-toolkit-base. Preparing to unpack .../nvidia-container-toolkit-base_1.17.3-1_amd64.deb ... Unpacking nvidia-container-toolkit-base (1.17.3-1) ... Selecting previously unselected package nvidia-container-toolkit. Preparing to unpack .../nvidia-container-toolkit_1.17.3-1_amd64.deb ... Unpacking nvidia-container-toolkit (1.17.3-1) ... Selecting previously unselected package nvidia-docker2. Preparing to unpack .../nvidia-docker2_2.14.0-1_all.deb ... Unpacking nvidia-docker2 (2.14.0-1) ... Setting up nvidia-container-toolkit-base (1.17.3-1) ... Setting up libnvidia-container1:amd64 (1.17.3-1) ... Setting up libnvidia-container-tools (1.17.3-1) ... Setting up nvidia-container-toolkit (1.17.3-1) ... Setting up nvidia-docker2 (2.14.0-1) ... ...[ SNIP ]...
Once again, REBOOT the system for the changes to take effect.
Finally, to test the access of the NVidia GPU from docker, execute the following command:
$ docker run --rm --gpus all --env NVIDIA_DISABLE_REQUIRE=1 nvidia/cuda:12.5.1-base-ubuntu24.04 nvidia-smi
The following would be a typical output:
Sun Dec 22 18:19:27 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:04:00.0 On | N/A | | 0% 39C P8 11W / 165W | 635MiB / 16380MiB | 5% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
WALLA !!! - we have successfully integrated the NVidia GPU runtime with the docker environment.
References