Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is nvidia-container-toolkit built for 21.10? #6

Closed
RafalSkolasinski opened this issue Dec 30, 2021 · 7 comments
Closed

is nvidia-container-toolkit built for 21.10? #6

RafalSkolasinski opened this issue Dec 30, 2021 · 7 comments

Comments

@RafalSkolasinski
Copy link

As per pop-os/beta#289 when I was trying beta of 21.10 the nvidia-container-toolkit package was missing.

As 21.04 with reach its EOL around end of January, the question arise if this package will be built and included in pop repos.

Because of the cgroups v2 issues you may need to use latest upstream as it should have added support for it NVIDIA/nvidia-docker#1447

@AdrianJohnston
Copy link

AdrianJohnston commented Jan 20, 2022

I'm having a similar issue, upgraded to 21.10 due to EOL on 21.04 and now cant use docker with NVIDIA.
I've followed the instructions in: pop-os/pop#1708 (comment) and have had no luck with that approach.

I have modified /etc/nvidia-container-runtime/config.toml with "no-cgroups = true" as per NVIDIA/nvidia-docker#1447

which results in

Failed to initialize NVML: Unknown Error

But there is a workaround for me. By following NVIDIA/nvidia-docker#1447 (comment).

docker run --rm --gpus all --device /dev/nvidia0 --device /dev/nvidia-modeset  --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl nvidia/cuda:11.0-base nvidia-smi

@mmstick
Copy link
Member

mmstick commented Jan 20, 2022

Until NVIDIA releases 1.8.0, you have to set kernelstub -a systemd.unified_cgroup_hierarchy=false

@mmstick mmstick closed this as completed Jan 20, 2022
@RafalSkolasinski
Copy link
Author

Thanks. I just updated to 21.10 my main installation due to the EOL of 21.04. I see that now nvidia-container-toolkit package was present. The cgroups trick will have to work for now.

Will the package be updated when NVIDIA releases 1.8.0 @mmstick? Do you know about any downsides to sticking with cgroups v1?

@mmstick
Copy link
Member

mmstick commented Jan 20, 2022

The downside is that you can't benefit from v2 improvements. I'll update the packages when NVIDIA officially releases 1.8.0

@TobiasNorlund
Copy link

As far as I understand, nvidia has now released 1.8: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.8.1
Is it possible to update the package then @mmstick ? Thanks!

@mmstick
Copy link
Member

mmstick commented Mar 16, 2022

@TobiasNorlund Did you mean to ask for 1.8.1 specifically? 1.8 was released a while back.

$ apt-cache policy nvidia-container-toolkit
nvidia-container-toolkit:
  Installed: 1.8.0-1pop1~1644260705~21.10~60691e5
  Candidate: 1.8.0-1pop1~1644260705~21.10~60691e5
  Version table:
 *** 1.8.0-1pop1~1644260705~21.10~60691e5 1002
       1001 http://apt.pop-os.org/release impish/main amd64 Packages

@TobiasNorlund
Copy link

Oh sorry, didn't realize I was actually running 1.8.0. However, after reboot it still doesn't work for me. Do you have any idea of what might be wrong?

$ nvidia-smi
Wed Mar 16 18:20:10 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   62C    P3    24W /  N/A |   1352MiB /  4096MiB |     45%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2025      G   /usr/lib/xorg/Xorg                790MiB |
|    0   N/A  N/A      2164      G   /usr/bin/gnome-shell              241MiB |
|    0   N/A  N/A      4501      G   ...715404079525871075,131072      207MiB |
|    0   N/A  N/A      4553      G   ...b/thunderbird/thunderbird      110MiB |
+-----------------------------------------------------------------------------+
$ apt list --installed | grep container

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

containerd.io/now 1.4.8-1 amd64 [installed,local]
libnvidia-container-tools/impish,now 1.8.0-1~1644255740~21.10~76ed4b4 amd64 [installed,automatic]
libnvidia-container1/impish,now 1.8.0-1~1644255740~21.10~76ed4b4 amd64 [installed,automatic]
nvidia-container-toolkit/impish,now 1.8.0-1pop1~1644260705~21.10~60691e5 amd64 [installed]
$ docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants