Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker can't find libnvidia-ml.so.1 #33

Closed
gregh3285 opened this issue Oct 15, 2023 · 7 comments
Closed

docker can't find libnvidia-ml.so.1 #33

gregh3285 opened this issue Oct 15, 2023 · 7 comments

Comments

@gregh3285
Copy link

It's looking like when I envoke zwift.sh, the nvidia-container-cli can't find libnvidia-ml.so.1. Transcript below.

gregh@Iago:~$ zwift.sh
+ ZWIFT_HOME=/home/gregh/.config/zwift/gregh
+ mkdir -p /home/gregh/.config/zwift/gregh
+ IMAGE=docker.io/netbrain/zwift
+ VERSION=latest
+ mkdir -p /home/gregh/.config/zwift/gregh
+ [[ ! -n '' ]]
++ command -v podman
+ [[ -x '' ]]
+ CONTAINER_TOOL=docker
+ [[ ! -n '' ]]
+ docker pull docker.io/netbrain/zwift:latest
latest: Pulling from netbrain/zwift
Digest: sha256:f17fe247e55c70c0d8726a920cec45418a8e1a40190d967c8638b03cdd6b3444
Status: Image is up to date for netbrain/zwift:latest
docker.io/netbrain/zwift:latest
+ [[ -f /proc/driver/nvidia/version ]]
+ VGA_DEVICE_FLAG='--gpus all'
++ docker run -d --rm --privileged -e DISPLAY=:1 -v /tmp/.X11-unix:/tmp/.X11-unix -v /run/user/1000/pulse:/run/user/1000/pulse -v /home/gregh/.config/zwift/gregh:/home/user/Zwift --gpus all docker.io/netbrain/zwift:latest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
+ CONTAINER=cc6105c56ab22531595581ee6efeb74aee627afb231b85d959fac4f5cf55fee6
+ [[ -z '' ]]
++ docker inspect '--format={{ .Config.Hostname  }}' cc6105c56ab22531595581ee6efeb74aee627afb231b85d959fac4f5cf55fee6
Error: No such object: cc6105c56ab22531595581ee6efeb74aee627afb231b85d959fac4f5cf55fee6
+ xhost +local:
non-network local connections being added to access control list

Ubuntu reports the following version of nvidia:

gregh@Iago:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.113.01  Tue Sep 12 19:41:24 UTC 2023
GCC version: 

Looking at nvidia-container-cli, it seems to be in the know of the specific libraries it needs:

gregh@Iago:~$ nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.535.113.01
/usr/lib/x86_64-linux-gnu/libcuda.so.535.113.01
/usr/lib/x86_64-linux-gnu/libcudadebugger.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvcuvid.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvoptix.so.535.113.01
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.535.113.01
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.535.113.01
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.535.113.01
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.535.113.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.535.113.01
/run/nvidia-persistenced/socket
/lib/firmware/nvidia/535.113.01/gsp_ga10x.bin
/lib/firmware/nvidia/535.113.01/gsp_tu10x.bin

I can see the libraries exist both as the base .1 library and as the specific for this driver:

gregh@Iago:~$ ls -al /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
lrwxrwxrwx 1 root root 26 Sep 25 04:32 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.535.113.01
gregh@Iago:~$ ls -al /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.113.01
-rw-r--r-- 1 root root 1819968 Sep 25 04:32 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.113.01

I'm not sure if this an issue with this docker or with my configuration of the nvidia docker. Looking to see if anyone else has a clue.

@gregh3285 gregh3285 changed the title docker can't find docker can't find libnvidia-ml.so.1 Oct 15, 2023
@netbrain
Copy link
Owner

@gregh3285
Copy link
Author

gregh3285 commented Oct 16, 2023

So, further debugging clearly shows this has nothing to do with the zwift docker. Something, unrelated, is hosed on my end. The nvidia-docker link above wasn't helpful, unfortunately. Closing this issue.

@oldnapalm
Copy link

I had this issue with Docker-desktop, are you using it? It works using plain Docker though.

@netbrain
Copy link
Owner

netbrain commented Oct 17, 2023 via email

@oldnapalm
Copy link

Wait, you used Docker desktop? For Darwin/mac?

For Ubuntu Linux (https://docs.docker.com/desktop/install/ubuntu/) and got the same error as the OP. After reading this NVIDIA/nvidia-container-toolkit#219 I tried without desktop and it worked.

@gregh3285
Copy link
Author

I'm running Ubuntu 23.04. I have the following packages related to docker and container installed:

gregh@Iago:~$ apt list --installed | grep docker

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

docker.io/lunar-updates,now 24.0.5-0ubuntu1~23.04.1 amd64 [installed,automatic]
docker/lunar,lunar,now 1.5-2 all [installed]
nvidia-docker2/unknown,now 2.13.0-1 all [installed]
wmdocker/lunar,now 1.5-2 amd64 [installed,automatic]
gregh@Iago:~$ apt list --installed | grep container

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

containerd/lunar-updates,now 1.7.2-0ubuntu1~23.04.1 amd64 [installed,automatic]
libnvidia-container-tools/unknown,now 1.14.3-1 amd64 [installed,automatic]
libnvidia-container1/unknown,now 1.14.3-1 amd64 [installed,automatic]
nvidia-container-toolkit-base/unknown,now 1.14.3-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.14.3-1 amd64 [installed]

@jordimassaguerpla
Copy link

I had a similar issue and I fixed by installing nvidia-computeG05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants