Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Cuda 11 doesn't work, Cuda 10 does #1439

Closed
8 of 9 tasks
crobibero opened this issue Dec 28, 2020 · 7 comments
Closed
8 of 9 tasks

Cuda 11 doesn't work, Cuda 10 does #1439

crobibero opened this issue Dec 28, 2020 · 7 comments

Comments

@crobibero
Copy link

1. Issue or feature description

Unable to run cuda 11 container.

2. Steps to reproduce the issue

➜ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 127: unknown.
➜ docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
Mon Dec 28 19:09:18 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2200        On   | 00000000:83:00.0 Off |                  N/A |
| 52%   44C    P8     5W /  75W |      1MiB /  5059MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Host

➜ nvidia-smi
Mon Dec 28 14:14:02 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2200        On   | 00000000:83:00.0 Off |                  N/A |
| 52%   45C    P8     5W /  75W |      1MiB /  5059MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
➜ nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I1228 19:10:07.298308 104891 nvc.c:282] initializing library context (version=1.3.1, build=ac02636a318fe7dcc71eaeb3cc55d0c8541c1072)
I1228 19:10:07.298417 104891 nvc.c:256] using root /
I1228 19:10:07.298431 104891 nvc.c:257] using ldcache /etc/ld.so.cache
I1228 19:10:07.298440 104891 nvc.c:258] using unprivileged user 65534:65534
I1228 19:10:07.298482 104891 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1228 19:10:07.298626 104891 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
I1228 19:10:07.301411 104892 nvc.c:192] loading kernel module nvidia
I1228 19:10:07.301713 104892 nvc.c:204] loading kernel module nvidia_uvm
I1228 19:10:07.301844 104892 nvc.c:212] loading kernel module nvidia_modeset
I1228 19:10:07.302193 104893 driver.c:101] starting driver service
I1228 19:10:07.304805 104891 nvc_info.c:680] requesting driver information with ''
I1228 19:10:07.306585 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.80.02
I1228 19:10:07.306657 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.82
I1228 19:10:07.306712 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.80.02
I1228 19:10:07.306765 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.82
I1228 19:10:07.306853 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.450.80.02
I1228 19:10:07.306988 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.450.80.02
I1228 19:10:07.307096 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.80.02
I1228 19:10:07.307144 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.82
I1228 19:10:07.307191 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.80.02
I1228 19:10:07.307238 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.82
I1228 19:10:07.307285 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.80.02
I1228 19:10:07.307332 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.82
I1228 19:10:07.307414 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.450.80.02
I1228 19:10:07.307462 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.80.02
I1228 19:10:07.307510 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.82
I1228 19:10:07.307624 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.450.80.02
I1228 19:10:07.307673 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.80.02
I1228 19:10:07.307719 104891 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.82
I1228 19:10:07.307803 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.450.80.02
I1228 19:10:07.308143 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.450.80.02
I1228 19:10:07.308414 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.450.80.02
I1228 19:10:07.308499 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.80.02
I1228 19:10:07.308584 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.80.02
I1228 19:10:07.308668 104891 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.450.80.02
W1228 19:10:07.308719 104891 nvc_info.c:350] missing library libnvidia-opencl.so
W1228 19:10:07.308730 104891 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W1228 19:10:07.308740 104891 nvc_info.c:350] missing library libnvidia-allocator.so
W1228 19:10:07.308749 104891 nvc_info.c:350] missing library libnvidia-compiler.so
W1228 19:10:07.308757 104891 nvc_info.c:350] missing library libnvidia-ngx.so
W1228 19:10:07.308765 104891 nvc_info.c:350] missing library libvdpau_nvidia.so
W1228 19:10:07.308775 104891 nvc_info.c:350] missing library libnvidia-opticalflow.so
W1228 19:10:07.308785 104891 nvc_info.c:350] missing library libnvidia-fbc.so
W1228 19:10:07.308794 104891 nvc_info.c:350] missing library libnvidia-ifr.so
W1228 19:10:07.308803 104891 nvc_info.c:350] missing library libnvoptix.so
W1228 19:10:07.308812 104891 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W1228 19:10:07.308822 104891 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W1228 19:10:07.308831 104891 nvc_info.c:354] missing compat32 library libcuda.so
W1228 19:10:07.308840 104891 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W1228 19:10:07.308849 104891 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W1228 19:10:07.308857 104891 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W1228 19:10:07.308865 104891 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W1228 19:10:07.308874 104891 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W1228 19:10:07.308883 104891 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W1228 19:10:07.308892 104891 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W1228 19:10:07.308902 104891 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W1228 19:10:07.308911 104891 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W1228 19:10:07.308919 104891 nvc_info.c:354] missing compat32 library libnvcuvid.so
W1228 19:10:07.308928 104891 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W1228 19:10:07.308936 104891 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W1228 19:10:07.308944 104891 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W1228 19:10:07.308952 104891 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W1228 19:10:07.308962 104891 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W1228 19:10:07.308970 104891 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W1228 19:10:07.308979 104891 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W1228 19:10:07.308988 104891 nvc_info.c:354] missing compat32 library libnvoptix.so
W1228 19:10:07.309026 104891 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W1228 19:10:07.309036 104891 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W1228 19:10:07.309044 104891 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W1228 19:10:07.309051 104891 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W1228 19:10:07.309059 104891 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W1228 19:10:07.309069 104891 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I1228 19:10:07.309383 104891 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
I1228 19:10:07.309438 104891 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
I1228 19:10:07.309464 104891 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
W1228 19:10:07.310089 104891 nvc_info.c:376] missing binary nvidia-cuda-mps-control
W1228 19:10:07.310100 104891 nvc_info.c:376] missing binary nvidia-cuda-mps-server
I1228 19:10:07.310135 104891 nvc_info.c:438] listing device /dev/nvidiactl
I1228 19:10:07.310143 104891 nvc_info.c:438] listing device /dev/nvidia-uvm
I1228 19:10:07.310152 104891 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I1228 19:10:07.310161 104891 nvc_info.c:438] listing device /dev/nvidia-modeset
I1228 19:10:07.310201 104891 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
W1228 19:10:07.310225 104891 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I1228 19:10:07.310236 104891 nvc_info.c:745] requesting device information with ''
I1228 19:10:07.316542 104891 nvc_info.c:628] listing device /dev/nvidia0 (GPU-3afbf355-4429-12ed-2994-20c77602b4c1 at 00000000:83:00.0)
NVRM version:   450.80.02
CUDA version:   11.0

Device Index:   0
Device Minor:   0
Model:          Quadro P2200
Brand:          Quadro
GPU UUID:       GPU-3afbf355-4429-12ed-2994-20c77602b4c1
Bus Location:   00000000:83:00.0
Architecture:   6.1
I1228 19:10:07.316598 104891 nvc.c:337] shutting down library context
I1228 19:10:07.317385 104893 driver.c:156] terminating driver service
I1228 19:10:07.317738 104891 driver.c:196] driver service terminated successfully
  • Kernel version from uname -a
➜ uname -a
Linux gibraltar 5.8.0-0.bpo.2-amd64 #1 SMP Debian 5.8.10-1~bpo10+1 (2020-09-26) x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
    N/A
  • Driver information from nvidia-smi -a
➜ nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Mon Dec 28 14:11:30 2020
Driver Version                            : 450.80.02
CUDA Version                              : 11.0

Attached GPUs                             : 1
GPU 00000000:83:00.0
    Product Name                          : Quadro P2200
    Product Brand                         : Quadro
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1324919110121
    GPU UUID                              : GPU-3afbf355-4429-12ed-2994-20c77602b4c1
    Minor Number                          : 0
    VBIOS Version                         : 86.06.77.00.03
    MultiGPU Board                        : No
    Board ID                              : 0x8300
    GPU Part Number                       : 900-5G420-1700-000
    Inforom Version
        Image Version                     : G420.0500.00.02
        OEM Object                        : 1.1
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x83
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1C3110DE
        Bus Id                            : 00000000:83:00.0
        Sub System Id                     : 0x131B10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 52 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 5059 MiB
        Used                              : 1 MiB
        Free                              : 5058 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 5 MiB
        Free                              : 251 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
        Aggregate
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 44 C
        GPU Shutdown Temp                 : 102 C
        GPU Slowdown Temp                 : 99 C
        GPU Max Operating Temp            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 5.20 W
        Power Limit                       : 75.00 W
        Default Power Limit               : 75.00 W
        Enforced Power Limit              : 75.00 W
        Min Power Limit                   : 75.00 W
        Max Power Limit                   : 75.00 W
    Clocks
        Graphics                          : 139 MHz
        SM                                : 139 MHz
        Memory                            : 405 MHz
        Video                             : 544 MHz
    Applications Clocks
        Graphics                          : 999 MHz
        Memory                            : 5005 MHz
    Default Applications Clocks
        Graphics                          : 999 MHz
        Memory                            : 5005 MHz
    Max Clocks
        Graphics                          : 1746 MHz
        SM                                : 1746 MHz
        Memory                            : 5005 MHz
        Video                             : 1569 MHz
    Max Customer Boost Clocks
        Graphics                          : 1746 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Processes                             : None
  • Docker version from docker version
➜ docker version
Client: Docker Engine - Community
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        831ebea
 Built:             Tue Dec 15 04:34:48 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       f001486
  Built:            Tue Dec 15 04:32:45 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
➜ dpkg -l "*nvidia*"
ii  libnvidia-rtcore:amd64                 450.80.02-1~bpo10+1 amd64        NVIDIA binary Vulkan ray tracing (rtcore) library
un  libnvidia-rtcore-450.80.02             <none>              <none>       (no description available)
un  libopengl0-glvnd-nvidia                <none>              <none>       (no description available)
ii  nvidia-alternative                     450.80.02-1~bpo10+1 amd64        allows the selection of NVIDIA as GLX provider
un  nvidia-alternative--kmod-alias         <none>              <none>       (no description available)
un  nvidia-alternative-legacy-173xx        <none>              <none>       (no description available)
un  nvidia-alternative-legacy-71xx         <none>              <none>       (no description available)
un  nvidia-alternative-legacy-96xx         <none>              <none>       (no description available)
ii  nvidia-container-runtime               3.4.0-1             amd64        NVIDIA container runtime
un  nvidia-container-runtime-hook          <none>              <none>       (no description available)
ii  nvidia-container-toolkit               1.4.0-1             amd64        NVIDIA container runtime hook
un  nvidia-cuda-mps                        <none>              <none>       (no description available)
un  nvidia-current                         <none>              <none>       (no description available)
un  nvidia-current-updates                 <none>              <none>       (no description available)
un  nvidia-docker                          <none>              <none>       (no description available)
ii  nvidia-docker2                         2.5.0-1             all          nvidia-docker CLI wrapper
ii  nvidia-driver                          450.80.02-1~bpo10+1 amd64        NVIDIA metapackage
un  nvidia-driver-any                      <none>              <none>       (no description available)
ii  nvidia-driver-bin                      450.80.02-1~bpo10+1 amd64        NVIDIA driver support binaries
un  nvidia-driver-bin-450.80.02            <none>              <none>       (no description available)
un  nvidia-driver-binary                   <none>              <none>       (no description available)
ii  nvidia-driver-libs:amd64               450.80.02-1~bpo10+1 amd64        NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un  nvidia-driver-libs-any                 <none>              <none>       (no description available)
un  nvidia-driver-libs-nonglvnd            <none>              <none>       (no description available)
ii  nvidia-egl-common                      450.80.02-1~bpo10+1 amd64        NVIDIA binary EGL driver - common files
ii  nvidia-egl-icd:amd64                   450.80.02-1~bpo10+1 amd64        NVIDIA EGL installable client driver (ICD)
un  nvidia-glx-any                         <none>              <none>       (no description available)
ii  nvidia-installer-cleanup               20151021+12~bpo10+1 amd64        cleanup after driver installation with the nvidia-installer
un  nvidia-kernel-450.80.02                <none>              <none>       (no description available)
ii  nvidia-kernel-common                   20151021+12~bpo10+1 amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                     450.80.02-1~bpo10+1 amd64        NVIDIA binary kernel module DKMS source
un  nvidia-kernel-source                   <none>              <none>       (no description available)
ii  nvidia-kernel-support                  450.80.02-1~bpo10+1 amd64        NVIDIA binary kernel module support files
un  nvidia-kernel-support--v1              <none>              <none>       (no description available)
un  nvidia-kernel-support-any              <none>              <none>       (no description available)
un  nvidia-legacy-304xx-alternative        <none>              <none>       (no description available)
un  nvidia-legacy-304xx-driver             <none>              <none>       (no description available)
un  nvidia-legacy-304xx-vdpau-driver       <none>              <none>       (no description available)
un  nvidia-legacy-340xx-alternative        <none>              <none>       (no description available)
un  nvidia-legacy-340xx-vdpau-driver       <none>              <none>       (no description available)
un  nvidia-legacy-390xx-vulkan-icd         <none>              <none>       (no description available)
ii  nvidia-legacy-check                    450.80.02-1~bpo10+1 amd64        check for NVIDIA GPUs requiring a legacy driver
un  nvidia-libopencl1-dev                  <none>              <none>       (no description available)
un  nvidia-libvdpau1                       <none>              <none>       (no description available)
ii  nvidia-modprobe                        450.66-1~bpo10+1    amd64        utility to load NVIDIA kernel modules and create device nodes
un  nvidia-nonglvnd-vulkan-common          <none>              <none>       (no description available)
un  nvidia-nonglvnd-vulkan-icd             <none>              <none>       (no description available)
ii  nvidia-persistenced                    418.56-1            amd64        daemon to maintain persistent software state in the NVIDIA driver
ii  nvidia-settings                        450.66-1~bpo10+1    amd64        tool for configuring the NVIDIA graphics driver
un  nvidia-settings-gtk-450.66             <none>              <none>       (no description available)
ii  nvidia-smi                             450.80.02-1~bpo10+1 amd64        NVIDIA System Management Interface
ii  nvidia-support                         20151021+12~bpo10+1 amd64        NVIDIA binary graphics driver support files
un  nvidia-tesla-418-vulkan-icd            <none>              <none>       (no description available)
un  nvidia-tesla-440-vulkan-icd            <none>              <none>       (no description available)
un  nvidia-tesla-450-vulkan-icd            <none>              <none>       (no description available)
un  nvidia-tesla-alternative               <none>              <none>       (no description available)
ii  nvidia-vdpau-driver:amd64              450.80.02-1~bpo10+1 amd64        Video Decode and Presentation API for Unix - NVIDIA driver
ii  nvidia-vulkan-common                   450.80.02-1~bpo10+1 amd64        NVIDIA Vulkan driver - common files
ii  nvidia-vulkan-icd:amd64                450.80.02-1~bpo10+1 amd64        NVIDIA Vulkan installable client driver (ICD)
un  nvidia-vulkan-icd-any                  <none>              <none>       (no description available)
ii  xserver-xorg-video-nvidia              450.80.02-1~bpo10+1 amd64        NVIDIA binary Xorg driver
un  xserver-xorg-video-nvidia-any          <none>              <none>       (no description available)
un  xserver-xorg-video-nvidia-legacy-304xx <none>              <none>       (no description available)
  • NVIDIA container library version from nvidia-container-cli -V
➜ nvidia-container-cli -V
version: 1.3.1
build date: 2020-12-14T14:18+00:00
build revision: ac02636a318fe7dcc71eaeb3cc55d0c8541c1072
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • NVIDIA container library logs (see troubleshooting)
    Nothing logged
  • Docker command, image and tag used
    Working: docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
    Not working: docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
@klueska
Copy link
Contributor

klueska commented Jan 4, 2021

I assume this is a ´Debian 10` system?

There are several open issues with this already:
#1399
#1424

However, I am still unable to reproduce the issue, so it is hard to debug.
Is there any particular quirks about your specific environment you can give me so that I can attempt to reproduce this again?

@crobibero
Copy link
Author

Sorry for the double issue- Github search failed me.

It's a standard Debian 10, upgraded from Debian 9. I was trying to use kernel 5.9 previously, but downgraded after finding out the nvidia driver isn't out of experimental yet.

I've also tried manual installation of the nvidia drivers (using .run)

It looks like my issue is the same as #1424, and I have ldconfig = "/sbin/ldconfig" in my config.toml.
Switching this to ldconfig = "@/sbin/ldconfig" causes both nvidia/cuda:10.0-base and nvidia/cuda:11.0-base to fail with

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

@uvr-jra
Copy link

uvr-jra commented Jan 19, 2021

Hi,

I'm facing the same issue on my Debian 10 Testing with nvidia/cuda:11.x-base.

Is there any news concerning this issue ?

Thank you for your help

@crobibero
Copy link
Author

I accidentally upgraded to kernel 5.9 again, so now I'm holding out until Debian 11 is released

@crobibero
Copy link
Author

After the latest nvidia-docker and gpu driver updates I have my gpu working inside of the container. I still had to set ldconfig = "/sbin/ldconfig" to get it working.

@klueska
Copy link
Contributor

klueska commented Feb 4, 2021

@crobibero Note, removing the @ will run /sbin/ldconfig from the container's OS (as opposed to running /sbin/ldconfig from the host OSS). So long as your container's has this binary available, it should be fine, but it won't work for all containers.

@crobibero
Copy link
Author

@crobibero Note, removing the @ will run /sbin/ldconfig from the container's OS (as opposed to running /sbin/ldconfig from the host OSS). So long as your container's has this binary available, it should be fine, but it won't work for all containers.

I understand, but keeping the @ caused this error for every container that I tried.

➜ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants