Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Failed to run nvidia/cuda:8.0 via nvidia-docker2. nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 127 #587

Closed
opiumfor opened this issue Dec 22, 2017 · 8 comments

Comments

@opiumfor
Copy link

opiumfor commented Dec 22, 2017

I can't run cuda-test container, the full information is presented below

uname -a
Linux atg-dev1 4.9.0-0.bpo.3-amd64 #1 SMP Debian 4.9.25-1~bpo8+1 (2017-05-19) x86_64 GNU/Linux

nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Fri Dec 22 14:13:16 2017
Driver Version : 375.66

Attached GPUs : 1
GPU 0000:01:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0321117031919
GPU UUID : GPU-24ca7387-fadf-936d-bb2c-9464bdbe3c7b
Minor Number : 0
VBIOS Version : 86.02.39.00.01
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : 900-1G611-0050-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 0000:01:00.0
Sub System Id : 0x85E21043
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 23 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 11170 MiB
Used : 0 MiB
Free : 11170 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0 ms
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 28 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
Power Readings
Power Management : Supported
Power Draw : 17.36 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1708 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

docker version
Client:
Version: 17.09.1-ce
API version: 1.32
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:24:51 2017
OS/Arch: linux/amd64

Server:
Version: 17.09.1-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:23:29 2017
OS/Arch: linux/amd64
Experimental: false

nvidia-container-cli -V
version: 1.0.0
build date: 2017-11-17T02:30+00:00
build revision: ec15c7233bd2de821ad5127cb0de6b52d9d2083c
build compiler: x86_64-linux-gnu-gcc-6 6.3.0 20170516
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

tail /var/log/nvidia-container-runtime-hook.log
I1222 10:32:59.213696 31403 nvc_mount.c:238] whitelisting device node 242:1
I1222 10:32:59.213797 31403 nvc_mount.c:89] mounting /dev/nvidia0 at /var/lib/docker/overlay2/822c1860eb902bd6bf33185ca50583747e5c7d070498ed1fe48bb6813fe48d8b/merged/dev/nvidia0
I1222 10:32:59.213910 31403 nvc_mount.c:202] mounting /proc/driver/nvidia/gpus/0000:01:00.0 at /var/lib/docker/overlay2/822c1860eb902bd6bf33185ca50583747e5c7d070498ed1fe48bb6813fe48d8b/merged/proc/driver/nvidia/gpus/0000:01:00.0
I1222 10:32:59.213972 31403 nvc_mount.c:238] whitelisting device node 195:0
I1222 10:32:59.214023 31403 nvc_ldcache.c:325] executing /sbin/ldconfig from host at /var/lib/docker/overlay2/822c1860eb902bd6bf33185ca50583747e5c7d070498ed1fe48bb6813fe48d8b/merged
W1222 10:32:59.235143 31403 utils.c:119] sh: 0: getcwd() failed: Operation not permitted
W1222 10:32:59.235198 31403 utils.c:119] /bin/sh: 0: Can't open /proc/self/fd/6
I1222 10:32:59.276980 31403 nvc.c:286] shutting down library context
I1222 10:32:59.277790 31409 driver.c:169] terminating driver service
I1222 10:32:59.279197 31403 driver.c:208] driver service terminated successfully

nvidia-docker run --rm nvidia/cuda:8.0-cudnn6-runtime nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:8.0-cudnn6-runtime nvidia-smi
container_linux.go:265: starting container process caused "process_linux.go:368: container init caused "process_linux.go:351: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-runtime-hook.log configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=8.0 --pid=9136 /var/lib/docker/overlay2/d1c8b3a27912444f7fe7b78424d5fcf700daac6cbc1cdecbf71abec3005d4e1a/merged]\\nnvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 127\\n\"""
docker: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused "process_linux.go:351: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-runtime-hook.log configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=8.0 --pid=9136 /var/lib/docker/overlay2/d1c8b3a27912444f7fe7b78424d5fcf700daac6cbc1cdecbf71abec3005d4e1a/merged]\\nnvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 127\\n\""".

apt-cache policy nvidia-docker2
nvidia-docker2:
Установлен: 2.0.1+docker17.09.1-1
Кандидат: 2.0.1+docker17.09.1-1
Таблица версий:
*** 2.0.1+docker17.09.1-1 0
500 https://nvidia.github.io/nvidia-docker/debian9/amd64/ Packages
100 /var/lib/dpkg/status
2.0.1+docker17.09.0-1 0
500 https://nvidia.github.io/nvidia-docker/debian9/amd64/ Packages
2.0.1+docker17.06.2-1 0
500 https://nvidia.github.io/nvidia-docker/debian9/amd64/ Packages
2.0.1+docker17.03.2-1 0
500 https://nvidia.github.io/nvidia-docker/debian9/amd64/ Packages

apt-cache policy docker-ce
docker-ce:
Установлен: 17.09.1ce-0debian
Кандидат: 17.09.1ce-0debian
Таблица версий:
*** 17.09.1ce-0debian 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
100 /var/lib/dpkg/status
17.09.0ce-0debian 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.06.2ce-0debian 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.06.1ce-0debian 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.06.0ce-0debian 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.03.2ce-0debian-jessie 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.03.1ce-0debian-jessie 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages
17.03.0ce-0debian-jessie 0
500 https://download.docker.com/linux/debian/ jessie/stable amd64 Packages

(sorry for russian, hope that all clear anyway)

@3XX0
Copy link
Member

3XX0 commented Dec 22, 2017

We don't support Debian 8, but you probably just need to edit /etc/nvidia-container-runtime/config.toml and change ldconfig = "@/sbin/ldconfig" to ldconfig = "@/sbin/ldconfig.real"

@opiumfor
Copy link
Author

opiumfor commented Dec 22, 2017

Seems it helped. Thank you!

nvidia-docker run --rm nvidia/cuda:8.0 nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:8.0 nvidia-smi
Fri Dec 22 11:58:49 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 0000:01:00.0 Off | N/A |
| 23% 27C P8 17W / 250W | 0MiB / 11170MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

@3XX0 3XX0 closed this as completed Dec 22, 2017
@coreyjewett
Copy link

coreyjewett commented Jan 11, 2018

I encountered this same issue on Ubuntu 14.04 LTS. Making the config change @3XX0 suggested seems to have fixed the problem. Is this version (14.04) also considered unsupported?

@flx42
Copy link
Member

flx42 commented Jan 12, 2018

@coreyjewett trusty should need @/sbin/ldconfig.real and that's what our xenial package does.
Maybe you still have the configuration file from your debian install?

@coreyjewett
Copy link

coreyjewett commented Jan 12, 2018 via email

@Axel13fr
Copy link

Hi Gents,
FYI, I had the issue on Ubu 16.04 with CUDA 10.0 and driver 410. This fixed it, but I don't quite understand how it still happened.

@gsss124
Copy link

gsss124 commented Sep 21, 2020

This did not solve my problem, I already have ldconfig.real set in /etc/nvidia-container-runtime/config.toml and still I get this error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\"": unknown.

@klueska
Copy link
Contributor

klueska commented Sep 21, 2020

This bug is over 2 years old and closed. Can you please file a new bug if you need hep debugging your problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants