Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency issue between nvidia-container-runtime and nvidia-docker2 #1708

Closed
Frikster opened this issue May 21, 2021 · 39 comments
Closed

Dependency issue between nvidia-container-runtime and nvidia-docker2 #1708

Frikster opened this issue May 21, 2021 · 39 comments

Comments

@Frikster
Copy link

Distribution (run cat /etc/os-release):

 NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

There is a dependency issue between nvidia-container-runtime in the PopOS PPA and nvidia-docker2 so I provide the policy of both below:

apt policy nvidia-container-runtime
nvidia-container-runtime:
  Installed: (none)
  Candidate: 3.4.0-1pop1~1601325114~20.04~2880fc6
  Version table:
     3.5.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.4.2-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.4.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.4.0-1pop1~1601325114~20.04~2880fc6 1001
       1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages
     3.4.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.3.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.2.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.4-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.3-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.2-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.7-3 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.6-3 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.5-3 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.5-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.4-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.3-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.2-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.3-3 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.2-2 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.2-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.0-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.03.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker17.12.1-1 500
        500 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
apt policy nvidia-docker2
nvidia-docker2:
  Installed: (none)
  Candidate: 2.6.0-1
  Version table:
     2.6.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.5.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.4.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.3.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.2-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.1.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.1.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.7-3 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.6-3 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.5-3 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.5-2 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.4-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.3-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.2-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.3-3 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.2-2 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.2-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.0-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.03.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker17.12.1-1 500
        500 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages

Issue/Bug Description:
I think the PopOS PPA is using an outdated version of nvidia-container-runtime which should be 3.5.0 not 3.4.0. Since nvidia-docker2 requires >=3.5.0 I think the PopOS PPA should be updated?

Steps to reproduce (if you know):
Follow install instructions as per the NVIDIA container toolkit guide. Since the distribution popos20.04 does not exist I substitute $ID$VERSION_ID with ubuntu20.04 so the GPG step is:

distribution=$(. /etc/os-release;echo ubuntu20.04) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Once I run sudo apt-get install -y nvidia-docker2 I get:

sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help resolve the situation:

The following packages have unmet dependencies:
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.5.0) but 3.4.0-1pop1~1601325114~20.04~2880fc6 is to be installed
E: Unable to correct problems, you have held broken packages.

Expected behavior:
My coworkers followed these steps some months ago without issue and have the inference server with Triton running. So, sudo apt-get install -y nvidia-docker2 should work without any workaround like the one described below.

Other Notes:
I can force my way past this using

sudo apt install aptitude
sudo aptitude install nvidia-docker2

I say no to the first option and then yes to the second:

The following NEW packages will be installed:
  nvidia-docker2{b} 
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/5,960 B of archives. After unpacking 27.6 kB will be used.
The following packages have unmet dependencies:
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.5.0) but it is not going to be installed
The following actions will resolve these dependencies:

     Keep the following packages at their current version:
1)     nvidia-docker2 [Not Installed]                     



Accept this solution? [Y/n/q/?] n

The following actions will resolve these dependencies:

     Install the following packages:                   
1)     libnvidia-container-tools [1.4.0-1 (bionic)]    
2)     libnvidia-container1 [1.4.0-1 (bionic)]         
3)     nvidia-container-runtime [3.5.0-1 (bionic)]     
4)     nvidia-container-toolkit [1.5.0-1 (bionic, now)]



Accept this solution? [Y/n/q/?] y

This fixes my issue and then nvidia-docker2 is installed and works. But then I get this horror:

image

And now PopOS will no longer update.

Ok, so let's try again but say no to the first 2 options and then yes to the third:

sudo aptitude install nvidia-docker2

<say no to first 2>

The following actions will resolve these dependencies:

     Install the following packages:                                           
1)     libnvidia-container-tools [1.3.0-1pop1~1601490873~20.04~cb62a8d (focal)]
2)     libnvidia-container1 [1.4.0-1 (bionic)]                                 
3)     nvidia-container-runtime [3.4.0-1pop1~1601325114~20.04~2880fc6 (focal)] 
4)     nvidia-container-toolkit [1.3.0-1pop1~1601490793~20.04~270505a (focal)] 
5)     nvidia-docker2 [2.5.0-1 (bionic)]

Accept this solution? [Y/n/q/?] y

This not only installs nvidia-docker2 but I can go to the Pop shop and install OS updates without issue. But I think this workaround shouldn't be needed and a smooth customer experience where sudo aptitude install nvidia-docker2 just works is preferred. It used to work after all. And I'd like to see PopOS take over the world, hence the bug report.

@mmstick
Copy link
Member

mmstick commented May 21, 2021

If you're using a NVIDIA PPA, you have to set it to a higher Pin-Priority. Can't mix and match these dependencies.

@bassemkaroui
Copy link

I added to the file /etc/apt/preferences.d/pop-default-settings the following :

Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

That gave NVIDIA PPA higher pin priority

@mmstick
Copy link
Member

mmstick commented Jun 6, 2021

Any reason you can't use the packages from our PPA?

@bassemkaroui
Copy link

bassemkaroui commented Jun 6, 2021

Any reason you can't use the packages from our PPA?

nvidia-docker2 requires nvidia-container-runtime version >= 3.5.0. But the newest version of nvidia-container-runtime in Pop OS PPA is 3.4.0-1pop1~1601325114~20.10~2880fc6. Thus I couldn't install nvidia-docker2.

@mmstick
Copy link
Member

mmstick commented Jun 6, 2021

If you need me to update the repository then I will. Better than trying to work around it.

@mmstick
Copy link
Member

mmstick commented Jun 6, 2021

But we include the docker runtime files in our PPA too, so there can't be any conflicts.

@bassemkaroui
Copy link

I don't think you have nvidia-docker2 in your PPA.
Here is what apt policy nvidia-docker2 gave me :

nvidia-docker2:
  Installed: 2.6.0-1
  Candidate: 2.6.0-1
  Version table:
 *** 2.6.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
        100 /var/lib/dpkg/status
     2.5.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.4.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.3.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.2-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.2.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.1.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.1.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.7-3 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.6-3 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.5-3 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.5-2 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.4-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.3-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.2-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.09.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.3-3 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.2-2 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.2-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.06.0-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker18.03.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
     2.0.3+docker17.12.1-1 1002
       1002 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages

@mmstick
Copy link
Member

mmstick commented Jun 6, 2021

We ship tensorman, which is used for running Docker containers of tensorflow with a NVIDIA GPU. Which optionally depends on our nvidia-container-runtime packaging as a dependency, which installs the files for using the NVIDIA GPU with Docker. We do have a v3.5.0 branch though.

@bassemkaroui
Copy link

I used to use Tensorflow but now I switched to PyTorch so it would be interesting to know if tensorman work with PyTorch. But for this particular issue I wasn't trying to launch a docker container to use PyTorch but rather a Docker container of rapidsai which is a data science framework that has a collection of libraries for running end-to-end data science pipelines completely on GPUs.

To run it I had to pull the image (docker pull rapidsai/rapidsai:0.19-cuda11.2-runtime-ubuntu20.04-py3.8) then run the container using the following command :

docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:0.19-cuda11.2-runtime-ubuntu20.04-py3.8

It would be interesting to know if tensorman can do the same or not.

Thanks for your insights :)

@mmstick
Copy link
Member

mmstick commented Jun 6, 2021

That command will work with our nvidia-container-runtime package installed.

@bassemkaroui
Copy link

That command will work with our nvidia-container-runtime package installed.

Indeed it works.

nvidia-docker is merely a convenient wrapper around docker when using the environment variable NVIDIA_VISIBLE_DEVICES which requires setting setting --runtime to nvidia every time we use the environment variable instead of --gpus flag. And even then we can just set the default runtime to nvidia by adding "default-runtime": "nvidia" in the file /etc/docker/daemon.json.

We do have a v3.5.0 branch though.

It would be nice to have the v3.5.0 which will allow us to install nvidia-docker2 for the mere convenience of automatically creating the file /etc/docker/daemon.json, especially since you told me you have a branch with nvidia-container-runtime v3.5.0.

Thanks :)

@gsc2001
Copy link

gsc2001 commented Jun 17, 2021

But some images need NVIDIA_VISIBLE_DEVICES and --runtime to nvidia like osrf/ros:melodic-desktop-full. That needs nvidia-docker2 if you are doing hardware_acceleration. At that time @bassemkaroui steps help

@pinduzera
Copy link

If found out that if you don't want to mess with pop-os libraries you can try following this git:
https://github.com/pop-os/nvidia-container-runtime
You can use the nvidia-container-runtime from pop-os page

@groenator
Copy link

Are we going to be able to use nvidia-container-runtime in PopOS 21.04 with docker? I tried every possible option available but I always get broken packages.

@bassemkaroui
Copy link

Are we going to be able to use nvidia-container-runtime in PopOS 21.04 with docker? I tried every possible option available but I always get broken packages.

What do you mean ? For me it works fine (I have Pop OS 21.04 too). Here's apt policy nvidia-container-runtime :

nvidia-container-runtime:
  Installed: 3.4.0-1pop1~1601325114~20.10~2880fc6
  Candidate: 3.4.0-1pop1~1601325114~20.10~2880fc6
  Version table:
 *** 3.4.0-1pop1~1601325114~20.10~2880fc6 100
        100 /var/lib/dpkg/status

@groenator
Copy link

groenator commented Jul 11, 2021

For e.g I have this issue:

→ apt policy nvidia-container-runtime 
nvidia-container-runtime:
  Installed: (none)
  Candidate: (none)
  Version table:


→ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi     
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.


→ sudo apt install nvidia-container-runtime              
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package nvidia-container-runtime is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  nvidia-container-toolkit

E: Package 'nvidia-container-runtime' has no installation candidate

I tried installing nvidia-container-runtime or install nvidia-docker2 following the NVIDIA guide and nothing works.

What guide did you use to configure nvidia-container-runtime?

@bassemkaroui
Copy link

@groenator That's strange Pop OS PPAs have nvidia-container-runtime, try sudo apt update to update the repositories then apt policy nvidia-container-runtime one more time

@groenator
Copy link

Thanks, here is the output:

→ sudo apt update
Hit:1 http://ppa.launchpad.net/system76/pop/ubuntu hirsute InRelease
Hit:2 https://download.opensuse.org/repositories/home:/luke_nukem:/asus/xUbuntu_21.04  InRelease                           
Hit:3 http://us.archive.ubuntu.com/ubuntu hirsute InRelease         
Hit:4 http://us.archive.ubuntu.com/ubuntu hirsute-security InRelease
Hit:5 http://apt.pop-os.org/proprietary hirsute InRelease
Get:6 http://us.archive.ubuntu.com/ubuntu hirsute-updates InRelease [109 kB]
Get:7 http://us.archive.ubuntu.com/ubuntu hirsute-backports InRelease [101 kB]
Get:8 http://us.archive.ubuntu.com/ubuntu hirsute-updates/main amd64 Packages [262 kB]
Get:9 http://us.archive.ubuntu.com/ubuntu hirsute-backports/universe amd64 DEP-11 Metadata [9,144 B]
Fetched 481 kB in 1s (468 kB/s)             
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up-to-date.


→ apt policy nvidia-container-runtime
nvidia-container-runtime:
  Installed: (none)
  Candidate: (none)
  Version table:

@bassemkaroui
Copy link

What's the output of cat /etc/apt/preferences.d/pop-default-settings for you ?

@groenator
Copy link

Hi There,

This is the output:

cat /etc/apt/preferences.d/pop-default-settings
Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 1001

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 1001

@bassemkaroui
Copy link

I removed the packages from my computer and tried to reinstall them but I couldn't I had the same issue as you. I'm trying to figure out the pb. I'll come back to you just give me a minute

@groenator
Copy link

Sure! Thank you for your help.

@bassemkaroui
Copy link

bassemkaroui commented Jul 11, 2021

It seems a problem was introduced with Pop OS 21.04 but to resolve this problem you need to follow these steps :

  • distribution=ubuntu22.04 && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  • sudo nano /etc/apt/preferences.d/pop-default-settings and then add
    Package: *
    Pin: origin nvidia.github.io
    Pin-Priority: 1002
    
    in order to give nvidia's repositories higher pin priorities
  • sudo apt update
  • sudo apt install nvidia-docker2 → this will install all the dependencies
  • sudo systemctl restart docker

Now sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi should work.

@groenator
Copy link

groenator commented Jul 11, 2021

Hi,

Your solution did help, before I tried using the ubuntu20.04 distribution but I never was able to install it because I didn't pin the nvidia.github.io repo.

What does it mean in terms of drivers now? Would I get the Nvidia drivers from the PopOS repo?

apt policy nvidia-container-runtime            
nvidia-container-runtime:
  Installed: 3.5.0-1
  Candidate: 3.5.0-1
  Version table:
 *** 3.5.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
        100 /var/lib/dpkg/status
     3.4.2-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.4.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.4.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.3.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.2.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.4-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.3-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.2-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     3.1.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.7-3 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.6-3 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.5-3 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.5-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.4-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.3-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.2-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.09.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.3-3 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.2-2 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.2-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.06.0-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker18.03.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
     2.0.0+docker17.12.1-1 1002
       1002 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages

Thank you for your help.

@bassemkaroui
Copy link

bassemkaroui commented Jul 11, 2021

In github repo for pop-os/nvidia-container-runtime we can see a problem with the last commit, when I inspected it, it was with pop-os/staging/hirsute/binary-amd64 so this was introduced in Pop OS 21.04 (Hirsute).

@mmstick Can you check this problem please ?

@EdRW
Copy link

EdRW commented Sep 29, 2021

I just wanted to chime in to say that on Pop!_OS 21.04 it seems that this issue may be resolved.

Just now I had no problems installing nvidia-docker2 without making any Pin-Priority changes.

Aside from having to hardcode ubuntu20.04 in the first script, the install instructions below from the NVIDIA Container Toolkit guide worked for me.

distribution=ubuntu20.04 \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

@cboettig
Copy link

hmm, this no longer works for me on popOS! 21.10 / impish 😢

@afiaka87
Copy link

On 21.10 this now results in:

$ nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

@cboettig
Copy link

@afiaka87 thanks for confirming -- yes, that's the same error I am seeing. I also tried apt-pinning as in pop-os/nvidia-container-toolkit#1 with no luck, and with nvidia installation from the default ubuntu repos, always with the same error though. Anyone know if this impacts vanilla ubuntu-21.10 as well as popOS! 21.10? (I know officially NVIDIA is just supporting ubuntu LTS releases, but using those directions has worked well until now).

@hlacikd
Copy link

hlacikd commented Dec 31, 2021

On 21.10 this now results in:

$ nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

guys, just follow official guide https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

it will install following repos :

➜ ~ cat /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

and just uncomment those experimental ones , 1.8.0~rc.1-1 will be installed which works great on PopOS 21.10

@cboettig
Copy link

cboettig commented Jan 3, 2022

Thanks @hlacik , that worked like a charm! Much appreciated!

(I did have to set my apt-pin preferences to prioritize nvidia over popOS as in pop-os/nvidia-container-toolkit#1, then it was happy to upgrade to 1.8.0~rc1-1 and everything was working again perfectly.

@afiaka87
Copy link

afiaka87 commented Jan 6, 2022

@hlacik Thanks will give this a shot when I have time. Has been making it challenging to work on ML projects.

@mmstick
Copy link
Member

mmstick commented Jan 14, 2022

Forgot to close this, since this has been resolved 2 weeks ago.

@AdrianJohnston
Copy link

On 21.10 this now results in:

$ nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

guys, just follow official guide https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

it will install following repos :

➜ ~ cat /etc/apt/sources.list.d/nvidia-docker.list deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

and just uncomment those experimental ones , 1.8.0~rc.1-1 will be installed which works great on PopOS 21.10

I personally have had no luck with this approach, so for anyone else experiencing this, I am using the following workaround.

I have modified /etc/nvidia-container-runtime/config.toml with "no-cgroups = true" as per NVIDIA/nvidia-docker#1447

which results in

Failed to initialize NVML: Unknown Error

By following NVIDIA/nvidia-docker#1447 (comment) and also adding in the devices I have managed to get it working for now.

docker run --rm --gpus all --device /dev/nvidia0 --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl nvidia/cuda:11.0-base nvidia-smi

@csimpi
Copy link

csimpi commented Feb 15, 2022

@mmstick
I just installed PopOs 20.04 (I don't want to use non LTS) and these solution above doesn't work.
What to do and why is this closed when there are only shady workarounds and nothing clear solution?

more details:

NVIDIA/nvidia-container-toolkit#190

@bassemkaroui
Copy link

@csimpi did you try this solution #1708 (comment) ?

@keskinonur
Copy link

In popOS 20.04, I manage to solve it by editing apt preferences. Just edit the file like below, update repo and install nvidia-docker2.

sudo nano /etc/apt/preferences.d/pop-default-settings

Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

sudo apt update
sudo apt install nvidia-docker2
sudo systemctl restart docker

@csimpi
Copy link

csimpi commented Mar 19, 2022

@bassemkaroui This is not a solution, this is an ugly workaround that nobody should use. Overriding API priorities without knowing that PopOS devs want to do in the future is the worst idea ever

@jacobgkau
Copy link
Member

@bassemkaroui This is not a solution, this is an ugly workaround that nobody should use. Overriding API priorities without knowing that PopOS devs want to do in the future is the worst idea ever

Just to be clear, apt priorities (not "API priorities") are a part of apt-based distributions, and a Pop!_OS developer (@mmstick) specifically suggested it higher up in the thread. If you need to use the NVIDIA PPA instead of packages provided by Pop!_OS, then doing this is the correct solution.

Of course, it's recommended to just use the packages provided by Pop!_OS instead of the NVIDIA PPA. For Pop!_OS 21.10 and above, only the nvidia-docker2 should be needed (nvidia-container-runtime no longer exists) according to https://support.system76.com/articles/tensorman.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests