Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't find libnvidia-ml.so library in your system #299

Closed
PhilipDeegan opened this issue Jun 5, 2020 · 13 comments
Closed

Couldn't find libnvidia-ml.so library in your system #299

PhilipDeegan opened this issue Jun 5, 2020 · 13 comments
Labels
bug Issue/PR to expose/discuss/fix a bug

Comments

@PhilipDeegan
Copy link

System: Debian 10 buster-backports

See: NVIDIA/nvidia-docker#854

The comment solves it:
NVIDIA/nvidia-docker#854 (comment)

@klueska
Copy link
Contributor

klueska commented Jun 5, 2020

Can you run nvidia-container-cli -k -d /dev/tty info and provide the output.

@PhilipDeegan
Copy link
Author

nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0605 14:23:04.356552 3849 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db)
I0605 14:23:04.356672 3849 nvc.c:255] using root /
I0605 14:23:04.356703 3849 nvc.c:256] using ldcache /etc/ld.so.cache
I0605 14:23:04.356735 3849 nvc.c:257] using unprivileged user 1000:1000
W0605 14:23:04.485360 3850 nvc.c:186] failed to set inheritable capabilities
W0605 14:23:04.485463 3850 nvc.c:187] skipping kernel modules load due to failure
I0605 14:23:04.486029 3851 driver.c:101] starting driver service
I0605 14:23:04.491408 3849 nvc_info.c:541] requesting driver information with ''
I0605 14:23:04.492365 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.82
I0605 14:23:04.492403 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.82
I0605 14:23:04.492457 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.440.82
I0605 14:23:04.492506 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.440.82
I0605 14:23:04.492534 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.82
I0605 14:23:04.492562 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.82
I0605 14:23:04.492590 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.82
I0605 14:23:04.492621 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.82
I0605 14:23:04.492648 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.82
I0605 14:23:04.492716 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.440.82
I0605 14:23:04.492743 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.82
I0605 14:23:04.492908 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.440.82
I0605 14:23:04.493076 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.440.82
I0605 14:23:04.493160 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.440.82
I0605 14:23:04.493239 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.440.82
I0605 14:23:04.493321 3849 nvc_info.c:155] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.440.82
W0605 14:23:04.493366 3849 nvc_info.c:306] missing library libnvidia-opencl.so
W0605 14:23:04.493384 3849 nvc_info.c:306] missing library libnvidia-allocator.so
W0605 14:23:04.493387 3849 nvc_info.c:306] missing library libnvidia-compiler.so
W0605 14:23:04.493390 3849 nvc_info.c:306] missing library libvdpau_nvidia.so
W0605 14:23:04.493412 3849 nvc_info.c:306] missing library libnvidia-encode.so
W0605 14:23:04.493417 3849 nvc_info.c:306] missing library libnvidia-opticalflow.so
W0605 14:23:04.493422 3849 nvc_info.c:306] missing library libnvcuvid.so
W0605 14:23:04.493428 3849 nvc_info.c:306] missing library libnvidia-fbc.so
W0605 14:23:04.493434 3849 nvc_info.c:306] missing library libnvidia-ifr.so
W0605 14:23:04.493440 3849 nvc_info.c:306] missing library libnvoptix.so
W0605 14:23:04.493446 3849 nvc_info.c:310] missing compat32 library libnvidia-ml.so
W0605 14:23:04.493452 3849 nvc_info.c:310] missing compat32 library libnvidia-cfg.so
W0605 14:23:04.493458 3849 nvc_info.c:310] missing compat32 library libcuda.so
W0605 14:23:04.493463 3849 nvc_info.c:310] missing compat32 library libnvidia-opencl.so
W0605 14:23:04.493469 3849 nvc_info.c:310] missing compat32 library libnvidia-ptxjitcompiler.so
W0605 14:23:04.493475 3849 nvc_info.c:310] missing compat32 library libnvidia-fatbinaryloader.so
W0605 14:23:04.493480 3849 nvc_info.c:310] missing compat32 library libnvidia-allocator.so
W0605 14:23:04.493486 3849 nvc_info.c:310] missing compat32 library libnvidia-compiler.so
W0605 14:23:04.493491 3849 nvc_info.c:310] missing compat32 library libvdpau_nvidia.so
W0605 14:23:04.493497 3849 nvc_info.c:310] missing compat32 library libnvidia-encode.so
W0605 14:23:04.493503 3849 nvc_info.c:310] missing compat32 library libnvidia-opticalflow.so
W0605 14:23:04.493508 3849 nvc_info.c:310] missing compat32 library libnvcuvid.so
W0605 14:23:04.493514 3849 nvc_info.c:310] missing compat32 library libnvidia-eglcore.so
W0605 14:23:04.493520 3849 nvc_info.c:310] missing compat32 library libnvidia-glcore.so
W0605 14:23:04.493526 3849 nvc_info.c:310] missing compat32 library libnvidia-tls.so
W0605 14:23:04.493531 3849 nvc_info.c:310] missing compat32 library libnvidia-glsi.so
W0605 14:23:04.493537 3849 nvc_info.c:310] missing compat32 library libnvidia-fbc.so
W0605 14:23:04.493543 3849 nvc_info.c:310] missing compat32 library libnvidia-ifr.so
W0605 14:23:04.493548 3849 nvc_info.c:310] missing compat32 library libnvidia-rtcore.so
W0605 14:23:04.493554 3849 nvc_info.c:310] missing compat32 library libnvoptix.so
W0605 14:23:04.493560 3849 nvc_info.c:310] missing compat32 library libGLX_nvidia.so
W0605 14:23:04.493566 3849 nvc_info.c:310] missing compat32 library libEGL_nvidia.so
W0605 14:23:04.493572 3849 nvc_info.c:310] missing compat32 library libGLESv2_nvidia.so
W0605 14:23:04.493577 3849 nvc_info.c:310] missing compat32 library libGLESv1_CM_nvidia.so
W0605 14:23:04.493583 3849 nvc_info.c:310] missing compat32 library libnvidia-glvkspirv.so
W0605 14:23:04.493589 3849 nvc_info.c:310] missing compat32 library libnvidia-cbl.so
I0605 14:23:04.494053 3849 nvc_info.c:236] selecting /usr/lib/nvidia/current/nvidia-smi
I0605 14:23:04.494088 3849 nvc_info.c:236] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0605 14:23:04.494104 3849 nvc_info.c:236] selecting /usr/bin/nvidia-persistenced
W0605 14:23:04.494180 3849 nvc_info.c:332] missing binary nvidia-cuda-mps-control
W0605 14:23:04.494184 3849 nvc_info.c:332] missing binary nvidia-cuda-mps-server
I0605 14:23:04.494230 3849 nvc_info.c:373] listing device /dev/nvidiactl
I0605 14:23:04.494235 3849 nvc_info.c:373] listing device /dev/nvidia-uvm
I0605 14:23:04.494240 3849 nvc_info.c:373] listing device /dev/nvidia-uvm-tools
I0605 14:23:04.494245 3849 nvc_info.c:373] listing device /dev/nvidia-modeset
I0605 14:23:04.494269 3849 nvc_info.c:277] listing ipc /run/nvidia-persistenced/socket
W0605 14:23:04.494282 3849 nvc_info.c:281] missing ipc /tmp/nvidia-mps
I0605 14:23:04.494287 3849 nvc_info.c:598] requesting device information with ''
I0605 14:23:04.501948 3849 nvc_info.c:637] listing device /dev/nvidia0 (GPU-7f3a0163-e7e5-79f9-edde-fd270af77272 at 00000000:01:00.0)
NVRM version:   440.82
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce GTX 1080 with Max-Q Design
Brand:          GeForce
GPU UUID:       GPU-7f3a0163-e7e5-79f9-edde-fd270af77272
Bus Location:   00000000:01:00.0
Architecture:   6.1
I0605 14:23:04.502072 3849 nvc.c:318] shutting down library context
I0605 14:23:04.502734 3851 driver.c:156] terminating driver service
I0605 14:23:04.503508 3849 driver.c:196] driver service terminated successfully

@klueska
Copy link
Contributor

klueska commented Jun 8, 2020

It seems that ldconfig is not being triggered properly by libnvidia-container on your system. I'm not 100% familiar with buster-backports. I know that for debian 9 and 10 (i.e. not with backports) no indirection through ldconfig.real is necessary to get access to the "real" ldconfig (unlike on Debian 8 and prior, as well as all Ubuntu based systems).

There is a configuration in /etc/nvidia-container-runtime/config.toml that lets nvidia-docker know what your "real" ldconfig is. If for some reason buster-backports has reintroduced an ldconfig.real file, then this configuration file will need to be updated to point to it.

Can you try and tab-complete on the name ldconfig and see if an ldconfig.real file shows up? If so, that is your issue, and you need to customize /etc/nvidia-container-runtime/config.toml appropriately.

@harperreed
Copy link

Running into this same issue .

System: Debian 10, buster-backports enabled.

I can run nvidia-smi outside of a container with no issue. Once in a container it fails like @dekken's example.

When i tab complete ldconfig it is just ldconfig (not ldconfig.real)

Like in other issues around this issue, if i use

docker run --gpus=all --rm nvidia/cuda bash -c "ldconfig;nvidia-smi" than nvidia-smi works

if i use

docker run --gpus=all --rm nvidia/cuda bash -c "nvidia-smi" than i get the above error

@regzon
Copy link

regzon commented Sep 16, 2020

The same issue here.

System: Debian Testing (bullseye)

ldconfig is available at path /sbin/ldconfig

Note: /sbin is a symlink to /usr/sbin

Log of nvidia-container-toolkit:


-- WARNING, the following logs are for debugging purposes only --

I0916 13:21:42.294324 22019 nvc.c:282] initializing library context (version=1.2.0, build=d22237acaea94aa5ad5de70aac903534ed598819)
I0916 13:21:42.294442 22019 nvc.c:256] using root /
I0916 13:21:42.294461 22019 nvc.c:257] using ldcache /etc/ld.so.cache
I0916 13:21:42.294477 22019 nvc.c:258] using unprivileged user 65534:65534
I0916 13:21:42.294514 22019 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0916 13:21:42.294810 22019 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
I0916 13:21:42.298069 22023 nvc.c:192] loading kernel module nvidia
I0916 13:21:42.298488 22023 nvc.c:204] loading kernel module nvidia_uvm
I0916 13:21:42.298656 22023 nvc.c:212] loading kernel module nvidia_modeset
I0916 13:21:42.299161 22024 driver.c:101] starting driver service
I0916 13:21:42.303174 22019 nvc_container.c:364] configuring container with 'compute utility supervised'
I0916 13:21:42.303437 22019 nvc_container.c:212] selecting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libcuda.so.418.152.00
I0916 13:21:42.303510 22019 nvc_container.c:212] selecting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libnvidia-fatbinaryloader.so.418.152.00
I0916 13:21:42.303555 22019 nvc_container.c:212] selecting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libnvidia-ptxjitcompiler.so.418.152.00
I0916 13:21:42.303733 22019 nvc_container.c:384] setting pid to 21995
I0916 13:21:42.303745 22019 nvc_container.c:385] setting rootfs to /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged
I0916 13:21:42.303755 22019 nvc_container.c:386] setting owner to 0:0
I0916 13:21:42.303765 22019 nvc_container.c:387] setting bins directory to /usr/bin
I0916 13:21:42.303775 22019 nvc_container.c:388] setting libs directory to /usr/lib/x86_64-linux-gnu
I0916 13:21:42.303784 22019 nvc_container.c:389] setting libs32 directory to /usr/lib/i386-linux-gnu
I0916 13:21:42.303794 22019 nvc_container.c:390] setting cudart directory to /usr/local/cuda
I0916 13:21:42.303803 22019 nvc_container.c:391] setting ldconfig to @/sbin/ldconfig (host relative)
I0916 13:21:42.303813 22019 nvc_container.c:392] setting mount namespace to /proc/21995/ns/mnt
I0916 13:21:42.303823 22019 nvc_container.c:394] setting devices cgroup to /sys/fs/cgroup/devices/docker/8324917e27bd8d74b848f4d2a73fcc0f580e272562c48686c838786c9f31f6b7
I0916 13:21:42.303838 22019 nvc_info.c:679] requesting driver information with ''
I0916 13:21:42.305835 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.66
I0916 13:21:42.305905 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.66
I0916 13:21:42.306012 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.450.66
I0916 13:21:42.306110 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.450.66
I0916 13:21:42.306166 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.66
I0916 13:21:42.306218 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.66
I0916 13:21:42.306273 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
I0916 13:21:42.306325 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.66
I0916 13:21:42.306451 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.450.66
I0916 13:21:42.306506 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.66
I0916 13:21:42.306877 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.450.66
I0916 13:21:42.307172 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.450.66
I0916 13:21:42.307282 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.66
I0916 13:21:42.307380 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.66
I0916 13:21:42.307476 22019 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.450.66
W0916 13:21:42.307538 22019 nvc_info.c:349] missing library libnvidia-opencl.so
W0916 13:21:42.307549 22019 nvc_info.c:349] missing library libnvidia-fatbinaryloader.so
W0916 13:21:42.307559 22019 nvc_info.c:349] missing library libnvidia-allocator.so
W0916 13:21:42.307568 22019 nvc_info.c:349] missing library libnvidia-compiler.so
W0916 13:21:42.307578 22019 nvc_info.c:349] missing library libnvidia-ngx.so
W0916 13:21:42.307587 22019 nvc_info.c:349] missing library libvdpau_nvidia.so
W0916 13:21:42.307597 22019 nvc_info.c:349] missing library libnvidia-encode.so
W0916 13:21:42.307607 22019 nvc_info.c:349] missing library libnvidia-opticalflow.so
W0916 13:21:42.307616 22019 nvc_info.c:349] missing library libnvcuvid.so
W0916 13:21:42.307626 22019 nvc_info.c:349] missing library libnvidia-fbc.so
W0916 13:21:42.307635 22019 nvc_info.c:349] missing library libnvidia-ifr.so
W0916 13:21:42.307645 22019 nvc_info.c:349] missing library libnvoptix.so
W0916 13:21:42.307654 22019 nvc_info.c:353] missing compat32 library libnvidia-ml.so
W0916 13:21:42.307664 22019 nvc_info.c:353] missing compat32 library libnvidia-cfg.so
W0916 13:21:42.307674 22019 nvc_info.c:353] missing compat32 library libcuda.so
W0916 13:21:42.307683 22019 nvc_info.c:353] missing compat32 library libnvidia-opencl.so
W0916 13:21:42.307693 22019 nvc_info.c:353] missing compat32 library libnvidia-ptxjitcompiler.so
W0916 13:21:42.307702 22019 nvc_info.c:353] missing compat32 library libnvidia-fatbinaryloader.so
W0916 13:21:42.307712 22019 nvc_info.c:353] missing compat32 library libnvidia-allocator.so
W0916 13:21:42.307722 22019 nvc_info.c:353] missing compat32 library libnvidia-compiler.so
W0916 13:21:42.307731 22019 nvc_info.c:353] missing compat32 library libnvidia-ngx.so
W0916 13:21:42.307741 22019 nvc_info.c:353] missing compat32 library libvdpau_nvidia.so
W0916 13:21:42.307750 22019 nvc_info.c:353] missing compat32 library libnvidia-encode.so
W0916 13:21:42.307760 22019 nvc_info.c:353] missing compat32 library libnvidia-opticalflow.so
W0916 13:21:42.307769 22019 nvc_info.c:353] missing compat32 library libnvcuvid.so
W0916 13:21:42.307779 22019 nvc_info.c:353] missing compat32 library libnvidia-eglcore.so
W0916 13:21:42.307788 22019 nvc_info.c:353] missing compat32 library libnvidia-glcore.so
W0916 13:21:42.307798 22019 nvc_info.c:353] missing compat32 library libnvidia-tls.so
W0916 13:21:42.307808 22019 nvc_info.c:353] missing compat32 library libnvidia-glsi.so
W0916 13:21:42.307817 22019 nvc_info.c:353] missing compat32 library libnvidia-fbc.so
W0916 13:21:42.307827 22019 nvc_info.c:353] missing compat32 library libnvidia-ifr.so
W0916 13:21:42.307836 22019 nvc_info.c:353] missing compat32 library libnvidia-rtcore.so
W0916 13:21:42.307846 22019 nvc_info.c:353] missing compat32 library libnvoptix.so
W0916 13:21:42.307855 22019 nvc_info.c:353] missing compat32 library libGLX_nvidia.so
W0916 13:21:42.307865 22019 nvc_info.c:353] missing compat32 library libEGL_nvidia.so
W0916 13:21:42.307874 22019 nvc_info.c:353] missing compat32 library libGLESv2_nvidia.so
W0916 13:21:42.307884 22019 nvc_info.c:353] missing compat32 library libGLESv1_CM_nvidia.so
W0916 13:21:42.307893 22019 nvc_info.c:353] missing compat32 library libnvidia-glvkspirv.so
W0916 13:21:42.307903 22019 nvc_info.c:353] missing compat32 library libnvidia-cbl.so
I0916 13:21:42.308377 22019 nvc_info.c:275] selecting /usr/lib/nvidia/current/nvidia-smi
I0916 13:21:42.308445 22019 nvc_info.c:275] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0916 13:21:42.308476 22019 nvc_info.c:275] selecting /usr/bin/nvidia-persistenced
W0916 13:21:42.308822 22019 nvc_info.c:375] missing binary nvidia-cuda-mps-control
W0916 13:21:42.308837 22019 nvc_info.c:375] missing binary nvidia-cuda-mps-server
I0916 13:21:42.308878 22019 nvc_info.c:437] listing device /dev/nvidiactl
I0916 13:21:42.308888 22019 nvc_info.c:437] listing device /dev/nvidia-uvm
I0916 13:21:42.308898 22019 nvc_info.c:437] listing device /dev/nvidia-uvm-tools
I0916 13:21:42.308907 22019 nvc_info.c:437] listing device /dev/nvidia-modeset
I0916 13:21:42.308948 22019 nvc_info.c:316] listing ipc /run/nvidia-persistenced/socket
W0916 13:21:42.308973 22019 nvc_info.c:320] missing ipc /tmp/nvidia-mps
I0916 13:21:42.308984 22019 nvc_info.c:744] requesting device information with ''
I0916 13:21:42.316877 22019 nvc_info.c:627] listing device /dev/nvidia0 (GPU-6064a007-a943-7f11-1ad7-12ac87046652 at 00000000:01:00.0)
I0916 13:21:42.317002 22019 nvc_mount.c:309] mounting tmpfs at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/proc/driver/nvidia
I0916 13:21:42.317443 22019 nvc_mount.c:77] mounting /usr/lib/nvidia/current/nvidia-smi at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/bin/nvidia-smi
I0916 13:21:42.317518 22019 nvc_mount.c:77] mounting /usr/lib/nvidia/current/nvidia-debugdump at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/bin/nvidia-debugdump
I0916 13:21:42.317583 22019 nvc_mount.c:77] mounting /usr/bin/nvidia-persistenced at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/bin/nvidia-persistenced
I0916 13:21:42.317771 22019 nvc_mount.c:77] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.450.66 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.66
I0916 13:21:42.317842 22019 nvc_mount.c:77] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.450.66 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.66
I0916 13:21:42.317908 22019 nvc_mount.c:77] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.450.66 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libcuda.so.450.66
I0916 13:21:42.317972 22019 nvc_mount.c:77] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.450.66 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.66
I0916 13:21:42.317999 22019 nvc_mount.c:489] creating symlink /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0916 13:21:42.318135 22019 nvc_mount.c:77] mounting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libcuda.so.418.152.00 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libcuda.so.418.152.00
I0916 13:21:42.318203 22019 nvc_mount.c:77] mounting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libnvidia-fatbinaryloader.so.418.152.00 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.152.00
I0916 13:21:42.318269 22019 nvc_mount.c:77] mounting /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/local/cuda-10.1/compat/libnvidia-ptxjitcompiler.so.418.152.00 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.152.00
I0916 13:21:42.318421 22019 nvc_mount.c:204] mounting /run/nvidia-persistenced/socket at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/run/nvidia-persistenced/socket
I0916 13:21:42.318497 22019 nvc_mount.c:173] mounting /dev/nvidiactl at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/dev/nvidiactl
I0916 13:21:42.318530 22019 nvc_mount.c:464] whitelisting device node 195:255
I0916 13:21:42.318606 22019 nvc_mount.c:173] mounting /dev/nvidia-uvm at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/dev/nvidia-uvm
I0916 13:21:42.318632 22019 nvc_mount.c:464] whitelisting device node 241:0
I0916 13:21:42.318690 22019 nvc_mount.c:173] mounting /dev/nvidia-uvm-tools at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/dev/nvidia-uvm-tools
I0916 13:21:42.318715 22019 nvc_mount.c:464] whitelisting device node 241:1
I0916 13:21:42.318787 22019 nvc_mount.c:173] mounting /dev/nvidia0 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/dev/nvidia0
I0916 13:21:42.318895 22019 nvc_mount.c:377] mounting /proc/driver/nvidia/gpus/0000:01:00.0 at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged/proc/driver/nvidia/gpus/0000:01:00.0
I0916 13:21:42.318924 22019 nvc_mount.c:464] whitelisting device node 195:0
I0916 13:21:42.318953 22019 nvc_ldcache.c:359] executing /sbin/ldconfig from host at /var/lib/docker/overlay2/fc0032cdaed5f3807bf66ecbf3ea00d728b44a4d66206ec4bf15b06a10ea49a7/merged
E0916 13:21:42.320363 1 nvc_ldcache.c:390] could not start /sbin/ldconfig: process execution failed: no such file or directory
I0916 13:21:42.320579 22019 nvc.c:337] shutting down library context
I0916 13:21:42.321308 22024 driver.c:156] terminating driver service
I0916 13:21:42.321735 22019 driver.c:196] driver service terminated successfully

Interesting parts is: could not start /sbin/ldconfig: process execution failed: no such file or directory

Location of ldconfig is set correctly:

ls -l /sbin/ | grep ldconfig

-rwxr-xr-x 1 root root    950056 Aug  4 18:02 ldconfig

Running ldconfig in container manually helps. Also, removing @ helps too but not with every image. For example running nvidia/cuda:11.0-runtime-ubuntu20.04 gives an error:
nvidia-container-cli: ldcache error: process /usr/sbin/ldconfig failed with error code: 127

@klueska
Copy link
Contributor

klueska commented Sep 16, 2020

As mentioned above in:
#299

Does your /etc/nvidia-container-runtime/config.toml file to point to the full absolute path of ldconfig?
If not, try updating it and see if that fixes things.

@regzon
Copy link

regzon commented Sep 16, 2020

Thank you for your reply. I've tested this already and got the same result:
could not start /usr/sbin/ldconfig: process execution failed: no such file or directory

@regzon
Copy link

regzon commented Sep 16, 2020

I believe this is the ldconfig path you're asking about.

Some additional info:

> which ldconfig
/usr/sbin/ldconfig

> ls -l /usr/sbin/ | grep ldconfig
-rwxr-xr-x 1 root root    950056 Aug  4 18:02 ldconfig

@klueska
Copy link
Contributor

klueska commented Sep 16, 2020

Is ldconfig under /usr/sbin or just /sbin? I know you said that one is a symlink to the other, but make sure that the actual location of ldconfig is the one in the config file.

@regzon
Copy link

regzon commented Sep 16, 2020

I've spent some time searching for another possible ldconfig binaries and found nothing. The one that is in /usr/sbin is not a symlink (and every parent directory too). Path /usr/sbin/ldconfig is the location that is used by the system (as which ldconfig says). Also, there are no ldconfig.real binaries presented (neither in the path, nor in /usr/sbin).

LS outputs (proof of no symlinks):

> ls -l / | grep usr
drwxr-xr-x  14 root root  4096 Sep 21  2019 usr

> ls -l /usr/ | grep sbin
drwxr-xr-x   2 root root 20480 Sep 13 15:32 sbin

> ls -l /usr/sbin/ | grep ldconfig
-rwxr-xr-x 1 root root    950056 Aug  4 18:02 ldconfig

The configuration file:

> cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/usr/sbin/ldconfig"

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"

I hope that I miss something obvious :)

@deric
Copy link

deric commented Jan 31, 2022

It seems to be working for me with full path pointing to ldconfig.real:

ldconfig = "/sbin/ldconfig.real"

@klueska what it the point of the @ prefix?

ldconfig = "@/sbin/ldconfig"

I'm using nvidia-driver from bullseye-backports/non-free at version 470.94-1~bpo11+1

@klueska
Copy link
Contributor

klueska commented Jan 31, 2022

The @ prefix means -- look for the following path on the host and run ldconfig from there.
Without the @ prefix it looks for the path inside the container and executes it.

@klueska
Copy link
Contributor

klueska commented Mar 22, 2022

The newest version of nvidia-docker should resolve these issues with ldconfig not properly setting up the library search path on debian systems before a container gets launched.

Specifically this change in libnvidia-container fixes the issue and is included as part of the latest release:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/141

The latest release packages for the full nvidia-docker stack:

libnvidia-container1-1.9.0
libnvidia-container-tools-1.9.0
nvidia-container-toolkit-1.9.0
nvidia-container-runtime-3.9.0
nvidia-docker-2.10.0

@elezar elezar transferred this issue from NVIDIA/nvidia-docker Jan 22, 2024
@klueska klueska added the bug Issue/PR to expose/discuss/fix a bug label Jan 26, 2024
@klueska klueska closed this as completed Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR to expose/discuss/fix a bug
Projects
None yet
Development

No branches or pull requests

5 participants