-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-container-cli not detecting mig devices #86
Comments
I'm not sure why it's happening, but the error stems from:
Once it fails on creating the first nvcap device it will not attempt to create any more (and these nvcap devices are needed to enumerate / access MIG devices). |
Here is the point in the code where this error occurs: |
What does the following command show for you:
Also this:
|
This:
And This:
If I am not mistaken, these are the only differences:
|
have any ideas? |
When I reinstalled the driver, the problem was solved |
I found my issue. From another Guide somewhere on the internet, I had the following udev rule:
After removing that and just restarting it worked! So takeaway is: do not fiddle with the nvidia devices and don't run nvidia-modprobe yourself. |
So the issue is probably quite clear from the title. MIG devices are setup and work perfectly, however nvidia-container-cli (and everything that uses it) does not find those devices.
The problem most likely comes from some installation problem at some point, however I could not find that point yet even after many reinstalls of all nvidia drivers. There are also people with a very similar setup that have gotten it to work without this issue.
Although I already posted on several other forums, I will list all of my installed versions, logs and everything I could find here, in the hopes that anybody might find why this does not work.
Running nvidia-smi on bare metal:
Running nvidia-smi in docker with all gpus:
Trying to run with single mig device:
Logs of nvidia-container-cli when doing this
Running nvidia-container-cli list:
Actual nvidia devices
Installed versions
System Information:
References:
The text was updated successfully, but these errors were encountered: