Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

apt-get update failling on cuda images #704

Closed
6 tasks
damien022 opened this issue Apr 16, 2018 · 9 comments
Closed
6 tasks

apt-get update failling on cuda images #704

damien022 opened this issue Apr 16, 2018 · 9 comments

Comments

@damien022
Copy link

1. Description

apt-get update fails when pulling headers from archive.ubuntu. I have seen this error both on nvidia/cuda:latest and nvidia/cudagl:9.0-devel-ubuntu16.04
Sorry for the data dump below...
I have also tried the same commands from another network and on another machine and it failed at the same place.
I did not use to have any problems pulling these images and updating them until today

2. Steps to reproduce the issue

$ docker pull nvidia/cuda:latest
$ docker run -it nvidia/cuda:latest /bin/bash
$ apt-get update

the output gets stalled at

Get:22 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [5153 B]
Get:23 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [7734 B]
0% [Working]

When attaching to nvidia/cudagl:9.0-devel-ubuntu16.04 the issue is elsewhere in apt-get update

Get:34 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [5153 B]
Get:35 http://archive.ubuntu.com/ubuntu xenial-backports/main i386 Packages [5141 B]
Get:36 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [7734 B]
Get:37 http://archive.ubuntu.com/ubuntu xenial-backports/universe i386 Packages [7719 B]
0% [Working]

3. Information

  • Kernel version from uname -a
    Linux ubuntu-ti 4.13.0-38-generic nvidia-docker from a windows host #43~16.04.1-Ubuntu SMP Wed Mar 14 17:48:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Any relevant kernel output lines from dmesg
    [12795.417391] IPv6: ADDRCONF(NETDEV_UP): vethdaefd77: link is not ready
    [12795.551939] eth0: renamed from veth74e4d3f
    [12795.579106] IPv6: ADDRCONF(NETDEV_CHANGE): vethdaefd77: link becomes ready
    [12795.579265] docker0: port 1(vethdaefd77) entered blocking state
    [12795.579272] docker0: port 1(vethdaefd77) entered forwarding state
    [13062.153460] docker0: port 2(veth97fa078) entered blocking state
    [13062.153466] docker0: port 2(veth97fa078) entered disabled state
    [13062.153577] device veth97fa078 entered promiscuous mode
    [13062.153777] IPv6: ADDRCONF(NETDEV_UP): veth97fa078: link is not ready
    [13062.153783] docker0: port 2(veth97fa078) entered blocking state
    [13062.153788] docker0: port 2(veth97fa078) entered forwarding state
    [13062.156077] docker0: port 2(veth97fa078) entered disabled state
    [13062.291392] eth0: renamed from veth4e0b109
    [13062.327544] IPv6: ADDRCONF(NETDEV_CHANGE): veth97fa078: link becomes ready
    [13062.327615] docker0: port 2(veth97fa078) entered blocking state
    [13062.327617] docker0: port 2(veth97fa078) entered forwarding state

  • Driver information from nvidia-smi -a
    Nvidia-smi 384,111

  • Docker version from docker version
    Client:
    Version: 18.03.0-ce
    API version: 1.37
    Go version: go1.9.4
    Git commit: 0520e24
    Built: Wed Mar 21 23:10:01 2018
    OS/Arch: linux/amd64
    Experimental: false
    Orchestrator: swarm
    Server:
    Engine:
    Version: 18.03.0-ce
    API version: 1.37 (minimum version 1.12)
    Go version: go1.9.4
    Git commit: 0520e24
    Built: Wed Mar 21 23:08:31 2018
    OS/Arch: linux/amd64
    Experimental: false

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
    ||/ Name Version Architecture Description
    +++-===================-==============-==============-===========================================
    ii libnvidia-container 1.0.0beta.1-1 amd64 NVIDIA container runtime library (command-l
    ii libnvidia-container 1.0.0
    beta.1-1 amd64 NVIDIA container runtime library
    ii nvidia-384 384.111-0ubunt amd64 NVIDIA binary driver - version 384.111
    un nvidia-common (no description available)
    ii nvidia-container-ru 2.0.0+docker18 amd64 NVIDIA container runtime
    ii nvidia-container-ru 1.3.0-1 amd64 NVIDIA container runtime hook
    un nvidia-cuda-toolkit (no description available)
    un nvidia-docker (no description available)
    ii nvidia-docker2 2.0.3+docker18 all nvidia-docker CLI wrapper
    un nvidia-driver-binar (no description available)
    un nvidia-legacy-340xx (no description available)
    un nvidia-libopencl1-3 (no description available)
    un nvidia-libopencl1-d (no description available)
    un nvidia-opencl-icd (no description available)
    ii nvidia-opencl-icd-3 384.111-0ubunt amd64 NVIDIA OpenCL ICD
    un nvidia-persistenced (no description available)
    ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
    ii nvidia-settings 361.42-0ubuntu amd64 Tool for configuring the NVIDIA graphics dr
    un nvidia-settings-bin (no description available)
    un nvidia-smi (no description available)
    un nvidia-vdpau-driver (no description available)

  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.0
    build date: 2018-03-06T01:53+00:00
    build revision: be797da00b156493e80f1ae6f38d69f23c932554
    build compiler: gcc-5 5.4.0 20160609
    build platform: x86_64

@flx42
Copy link
Member

flx42 commented Apr 16, 2018

Do you have the same with the ubuntu:16.04 image?

@damien022
Copy link
Author

hello!
I had tested it at the time and the ubuntu base image updated correctly.
I just tried again and the update worked on the cudagl image, so it might have been an issue with the ubuntu archive. Sorry for the hassle there!

@flx42
Copy link
Member

flx42 commented Apr 16, 2018

Yeah, it happens with all repos sometimes. Glad it's fixed!

@flx42 flx42 closed this as completed Apr 16, 2018
@jo-tham
Copy link

jo-tham commented May 3, 2018

Hi. I'm having the same issue as reported here.

Do you have any suggestions?

screenshot_2018-05-03_15-09-12

@flx42
Copy link
Member

flx42 commented May 4, 2018

@jo-tham the problem is that those repos now redirect to https, and apt by default don't know how to handle that. I'm discussing internally what we can do.

@jo-tham
Copy link

jo-tham commented May 4, 2018 via email

@flx42
Copy link
Member

flx42 commented May 4, 2018

@jo-tham it's now fixed, the change has been reverted: #725

@jo-tham
Copy link

jo-tham commented May 4, 2018

Fantastic; thanks for the heads up!

@fengyuentau
Copy link

fengyuentau commented Oct 14, 2019

Is there a manual that can solve this problem? I am working on an old 2017-built image, in which the apt update is stuck at 0% [Working] when comes to updating nvidia repos.

EDIT:
A comment from #725 (comment) figures out a solution:

rm /etc/apt/sources.list.d/nvidia-ml.list && apt clean && apt update

NOTE: apt clean is added to clean apt cache

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants