Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

nvidia-container-cli: container error: cgroup subsystem devices not found: unknown #1660

Closed
dixson3 opened this issue Aug 4, 2022 · 5 comments

Comments

@dixson3
Copy link

dixson3 commented Aug 4, 2022

Recently installed docker and nvidia cuda tools onto a PopOS 22.04 (Ubuntu 22.04) system. I am attempting to enable GPU access in docker.

❯ docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 bash -c "ldconfig; nvidia-smi"
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

I have already attempted to perform a clean install of docker (following the instructions at https://docs.docker.com/engine/install/ubuntu/) and the install of nvidia-docker2 (following the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide)

Am I missing a step? How can I resolve this?


Here are my particulars:

> lsb_release -a
No LSB modules are available.
Distributor ID:	Pop
Description:	Pop!_OS 22.04 LTS
Release:	22.04
Codename:	jammy
> nvidia-smi
Thu Aug  4 12:16:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0  On |                  N/A |
|  0%   35C    P8    11W / 310W |    449MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4239      G   /usr/lib/xorg/Xorg                203MiB |
|    0   N/A  N/A      5148      G   /usr/bin/gnome-shell               64MiB |
|    0   N/A  N/A      6664      G   alacritty                          10MiB |
|    0   N/A  N/A      8064      G   firefox                           168MiB |
+-----------------------------------------------------------------------------+
> apt list | rg installed | rg docker

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

docker-ce-cli/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed]
docker-ce-rootless-extras/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed,automatic]
docker-ce/jammy,now 5:20.10.17~3-0~ubuntu-jammy amd64 [installed]
docker-compose-plugin/jammy,now 2.6.0~ubuntu-jammy amd64 [installed]
docker-scan-plugin/jammy,now 0.17.0~ubuntu-jammy amd64 [installed,automatic]
nvidia-docker2/jammy,jammy,now 2.9.0-1~1644261147~22.04~c7639fe all [installed]
> apt list | rg installed | rg nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnvidia-cfg1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-common-515/jammy,jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed all [installed,automatic]
libnvidia-compute-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-compute-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-container-tools/jammy,now 1.8.0-1~1644255740~22.04~76ed4b4 amd64 [installed,automatic]
libnvidia-container1/jammy,now 1.8.0-1~1644255740~22.04~76ed4b4 amd64 [installed,automatic]
libnvidia-decode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-decode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,automatic]
libnvidia-encode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-encode-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-extra-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-fbc1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-fbc1-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
libnvidia-gl-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
libnvidia-gl-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed i386 [installed,automatic]
nvidia-compute-utils-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-container-toolkit/jammy,now 1.8.0-1pop1~1644260705~22.04~60691e5 amd64 [installed,automatic]
nvidia-dkms-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-docker2/jammy,jammy,now 2.9.0-1~1644261147~22.04~c7639fe all [installed]
nvidia-driver-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-kernel-common-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-kernel-source-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
nvidia-settings/jammy,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
system76-driver-nvidia/jammy,jammy,now 20.04.60~1659452571~22.04~9ef923b all [installed]
xserver-xorg-video-nvidia-515/jammy,now 515.48.07-1pop0~1657640780~22.04~e863eed amd64 [installed,automatic]
@klueska
Copy link
Contributor

klueska commented Aug 4, 2022

#1643 (comment)

@woook
Copy link

woook commented Oct 9, 2022

I appear to be having the same issue even with a later version of the toolkit:

lib-version: 1.11.0
build date: 2022-09-18T23:16+00:00
build revision: 
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-CeXONE/libnvidia-container-1.11.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro```

@EnkrateiaLucca
Copy link

I'm having the same issue!

@SebasGarcia08
Copy link

I'm having the same issue, any updates?

@klueska
Copy link
Contributor

klueska commented Feb 20, 2023

From what I've gathered responding to other tickets with this same issue, PopOS appears to compile and distribute their own version of libnvidia-container with WITH_NVCGO=no at compile time . Without this set to yes (which it is by default), there is no support for cgroupv2, and can result in the error you see here.

Since PopOS is building this library themselves, even recent versions of the library will appear to exhibit this issue, even if the same version of the official library does not.

Please make sure to override the PopOS repos and pull from the official NVIDIA repos instead.

A community provided solution for doing so can be found here:
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants