-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with docker image #707
Comments
make sure that the nvidia container install went through without errors |
likely a issue with the nvidia container toolkit, could you please try to reinstall it, and make sure to restart the dockerd afterwards, this method seemed to work for lots of people here: NVIDIA/nvidia-docker#1034 (comment) |
I had follwed the instructions to install the nvida container as mentioned above. I may have missed an error when i did that initiallly I re-ran them again to make sure. Now I get a new set of errors. docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to st art container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' Any ideas? |
@jrinck can you please make sure you have both the docker server and client at same version, I see comments here with similar issue and resolution suggests to either reinstall docker or make sure versions are consistent: NVIDIA/nvidia-container-toolkit#250 (see two comments up) |
@achraf-mer why would one need cudnn installed outside docker image? I don't think that should be required. |
it may not be required, but if installed it should be the latest versions, I can't find the link I read earlier but it did mention cudnn as possible culprit (at least things worked after upgrading). |
I just don't think it must be true and we shouldn't be having people update cudnn on bare metal when it's not required. One may require to update nvidia drivers, but that's it. |
ok, removed my suggestion from original comment. |
I did notice that the instructions here had a different distrubution than these instructions NVIDIA/nvidia-docker#1034 (comment) Baiscally https://nvidia.github.io/libnvidia-container vs https://nvidia.github.io/nvidia-docker I have tried both and still end up with the same error docker: Error response from daemon: failed to create task for container: failed to create shask: OCI runtime create failed: runc create failed: unable to start container process: erroring container init: error running hook #0: error running hook: exit status 1, stdout: , stdeAuto-detected mode as 'legacy' I have a POC tomorrow and would love to get this solved. I have H20GPT running but only using CPUs and it is slow |
@jrinck can you please post a also, did you attempt to run the container with sudo, perhaps you initially installed docker and nvidia toolkit with sudo, example to run:
also about the instructions, the definitive install steps are https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#setting-up-nvidia-container-toolkit, which is the source we used for the README_DOCKER.md |
Sorry for the delayed response. My demo/POC was the other day. I had to use a VM that only had CPUs. It was pretty slow. I deleted the other VM that had GPUs so I do not have the docker version. I am going to rebuild next week. Let's close this and we can pick it up after I rebuild. |
I built the docker image using the instructons here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_DOCKER.md
I use the command to run the image and I am getting this error:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
Anyone have any ideas on how to solve this?
The text was updated successfully, but these errors were encountered: