Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to Create Shim Task: OCI Runtime Create Failed #3

Open
k-e-i-z-a-i opened this issue Jun 8, 2022 · 5 comments
Open

Failed to Create Shim Task: OCI Runtime Create Failed #3

k-e-i-z-a-i opened this issue Jun 8, 2022 · 5 comments

Comments

@k-e-i-z-a-i
Copy link

I followed the NVIDIA Container Toolkit installation guide to install this on version 21.10 of Pop OS, but after following the guide and running sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi I get the following error message:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

My understanding is that, because I'm running Pop OS, it's this Project's version of the Container Toolkit that was installed on my changed.

How can this issue be fixed?

For additional background, here's the first line of what I get when I run nvidia-smi on my machine:

NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

@bassemkaroui
Copy link

I have the same issue. Unfortunately it doesn't seem that nvidia-docker2 downloaded from pop os repo is working. I installed nvidia-docker2 from nvidia official repo and it worked fine.
Follow these instructions if you don't know how to install it pop-os/pop#1708 (comment)

@marksumm
Copy link

Broken on 22.04 too. Installing latest *nvidia-container* packages from the official repo fixes it.

@berkgercek
Copy link

There's an issue on the nvidia-docker repo referencing this exact problem. @mmstick @elezar you appear to be the recent maintainers for this repo, would it be possible to implement a fix? The last response on that issue seems to identify the issue as a compile-time option in libnvidia-container, which I have also opened an issue on.

@elezar
Copy link
Contributor

elezar commented Mar 21, 2023

@berkgercek which issue to you mean in the libnvidia-container repo? Note that I am a maintainer in the upstream (NVIDIA) repo and not the pop fork.

@mmstick
Copy link
Member

mmstick commented Mar 21, 2023

@berkgercek @elezar NVCGO is disabled because it fails to compile when enabled.

libnvidia-container/src/cgroup.c:31:16: error: variable ‘res’ has initializer but incomplete type
   31 |         struct nvcgo_get_device_cgroup_version_res res = {0};
libnvidia-container/src/cgroup.c:31:52: error: storage size of ‘res’ isn’t known
   31 |         struct nvcgo_get_device_cgroup_version_res res = {0};
libnvidia-container/src/cgroup.c:37:46: warning: implicit declaration of function ‘nvcgo_get_device_cgroup_version_1’; did you mean ‘get_device_cgroup_version’? [-Wimplicit-function-declaration]
   37 |         if (call_rpc(err, &nvcgo->rpc, &res, nvcgo_get_device_cgroup_version_1, (char*)proc_root, cnt->cfg.pid) < 0)
libnvidia-container/src/error.h:28:9: error: static assertion failed: "incompatible alignment"
   28 |         static_assert(alignof(*err) == alignof(*xdr), "incompatible alignment");  \
libnvidia-container/src/cgroup.c:182:91: error: unknown type name ‘nvcgo_setup_device_cgroup_res’
  182 | nvcgo_setup_device_cgroup_1_svc(ptr_t ctxptr, int dev_cg_version, char *dev_cg, dev_t id, nvcgo_setup_device_cgroup_res *res, maybe_unused struct svc_req *req)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants