Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]] #57

Closed
izkula opened this issue Dec 15, 2020 · 10 comments

Comments

@izkula
Copy link

izkula commented Dec 15, 2020

Hi there,

I am unable to get either docker or pip installation to run with GUI on a remote server (Ubuntu 18.04.5 LTS).
nvidia-smi shows
NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
With a GeForce RTX 2080 SUPER

After installing docker according to these direction: https://docs.docker.com/engine/install/ubuntu/
sudo docker run hello-world runs successfully
I cloned the repository

git clone git@github.com:StanfordVL/iGibson.git
cd iGibson
./docker/pull-images.sh

docker images shows that I have these repositories download:
igibson/igibson-gui latest f1609b44544a 6 days ago 8.11GB
igibson/igibson latest e2d4fafb189b 6 days ago 7.48GB

But sudo ./docker/headless-gui/run.sh elicits this error:
Starting VNC server on port 5900 with password 112358
please run "python simulator_example.py" once you see the docker command prompt:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

sudo ./docker/base/run.sh also elicits:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

One guess is that something is wrong with OpenGL, but I don't know how to fix it.
If I run glxinfo -B, I get
name of display: localhost:12.0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
display: localhost:12 screen: 0
direct rendering: No (If you want to find out why, try setting LIBGL_DEBUG=verbose)
OpenGL vendor string: Intel Inc.
OpenGL renderer string: Intel(R) Iris(TM) Plus Graphics 655
OpenGL version string: 1.4 (2.1 INTEL-14.7.8)

Note: I can successfully run xeyes on the server and have it show up on my local machine.
And glxgears shows the gears image but the gears are not rotating.
(and returns this error:
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
)

I also tried the steps from the trouble shooting page:
ldconfig -p | grep EGL yields
libEGL_nvidia.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
libEGL_nvidia.so.0 (libc6) => /usr/lib/i386-linux-gnu/libEGL_nvidia.so.0
libEGL_mesa.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0
libEGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so.1
libEGL.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so
And I checked that
/usr/lib/x86_64-linux-gnu/libEGL.so -> libEGL.so.1.0.0

I also do not appear to have any directories such as
/usr/lib/nvidia-vvv
(I only have /usr/lib/nvidia, /usr/lib/nvidia-cuda-toolkit, and /usr/lib/nvidia-visual-profiler)

Any help would be very much appreciated! Thank you so much.

@fxia22
Copy link
Collaborator

fxia22 commented Dec 15, 2020

Looks like OpenGL is using integrated graphics instead of the dedicated graphics cards. Ideally it should show vendor being Nvidia.

OpenGL vendor string: Intel Inc.
OpenGL renderer string: Intel(R) Iris(TM) Plus Graphics 655
OpenGL version string: 1.4 (2.1 INTEL-14.7.8)

@fxia22
Copy link
Collaborator

fxia22 commented Dec 15, 2020

I guess you can try to reinstall nvidia driver, and select install OpenGL driver when installing. The docker problem originates from the same issue, preventing you from using docker --gpus all.

@fxia22
Copy link
Collaborator

fxia22 commented Dec 15, 2020

You can try installing nvidia-container-toolkit to solve the docker issue, as described here

@izkula
Copy link
Author

izkula commented Dec 15, 2020

Thanks so much for helping with this.
Any idea how to 'install OpenGL driver' while installing. That seems like a promising direction.
I tried reinstalling/upgrading the nvidia driver with
sudo apt install nvidia-driver-455
followed by
sudo reboot
Now, nvidia-smi shows that it was updated:
NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1

However, I still get the same result with glxinfo -B
$ glxinfo -B
name of display: localhost:11.0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
display: localhost:11 screen: 0
direct rendering: No (If you want to find out why, try setting LIBGL_DEBUG=verbose)
OpenGL vendor string: Intel Inc.
OpenGL renderer string: Intel(R) Iris(TM) Plus Graphics 655
OpenGL version string: 1.4 (2.1 INTEL-14.7.8)

There was never any option to include OpenGL driver etc. Is there something specific you were thinking of, or another method to install the driver?

Thanks again

@fxia22
Copy link
Collaborator

fxia22 commented Dec 16, 2020

Thanks for getting back @izkula, I think I have more information about your particular issue.

sudo apt install nvidia-driver-455 should have already handled the OpenGL driver installation, so you should have the opengl driver. However, glxinfo is still showing the integrated GPU(iGPU) instead of nvidia GPU being used. In this case, you would not be able to run any graphics-heavy applications.

This might be caused by the fact that you have a display connected to the iGPU, as reported here and here is a full thread of people having similar issues.

The monitor is connected to and the Xserver is running on the integrated matrox graphics. Please connect the monitor to the output on the nvidia card and disable the matrox in bios, if possible.

If you have access to this workstation, you can try to connect the monitor with the GPU. Otherwise, you can also edit xorg.conf to configure the xserver to use GPU, or follow the guide in nvidia's forum.

@izkula
Copy link
Author

izkula commented Dec 17, 2020

Hi,

After going down an unsuccessful rabbit hole with ssh -X and vnc, I decided to try switching computers. I am getting the same error with docker, though (and I was unable to install using the two other methods).

sudo ./docker/headless-gui/run.sh
Starting VNC server on port 5900 with password 112358
please run "python simulator_example.py" once you see the docker command prompt:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

This is through a remote desktop, and all the other things look fine...
nvidia-smi
Wed Dec 16 16:13:18 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36 Driver Version: 440.36 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K5200 Off | 00000000:03:00.0 On | Off |
| 30% 47C P0 44W / 150W | 769MiB / 8110MiB | 4% Default |
+-------------------------------+----------------------+----------------------+

glxinfo -B
name of display: :1
display: :1 screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 8192 MB
Total available memory: 8192 MB
Currently available dedicated video memory: 7335 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Quadro K5200/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 440.36
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6.0 NVIDIA 440.36
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 440.36
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

and glxgears runs fine
and sudo docker run hello-world runs successfully

I'm at a loss. Do I need to try running this on AWS or google cloud?

@fxia22
Copy link
Collaborator

fxia22 commented Dec 17, 2020

Hi @izkula, this computer does look promising since the OpenGL driver is found.

re: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]., this seems to be caused by your docker version is not high enough, are you using docker >= v19.0? For lower versions of docker, you might need to install nvidia-docker and use --runtime=nvidia instead of --gpus all. But the easiest solution is to upgrade docker to >= v19.

re: unable to install using the two other methods
I am wondering what went wrong with the other two installation methods? Can you share the error?

@izkula
Copy link
Author

izkula commented Dec 17, 2020

Hmmm,

docker --version
Docker version 20.10.1, build 831ebea

Other errors are here:
for pip install -e . approach, I get:
Installing collected packages: gibson2
Running setup.py develop for gibson2
ERROR: Command errored out with exit status 1:
command: /home/izkula/anaconda/envs/ig/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/izkula/src/iGibson/setup.py'"'"'; file='"'"'/home/izkula/src/iGibson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps
cwd: /home/izkula/src/iGibson/
Complete output (78 lines):
running develop
running egg_info
writing gibson2.egg-info/PKG-INFO
writing dependency_links to gibson2.egg-info/dependency_links.txt
writing requirements to gibson2.egg-info/requires.txt
writing top-level names to gibson2.egg-info/top_level.txt
reading manifest file 'gibson2.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'gibson2.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at /home/izkula/anaconda/envs/ig/lib/python3.7/site-packages/cmake/data/share/cmake-3.18/Modules/FindOpenGL.cmake:305 (message):
Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
available. Run "cmake --help-policy CMP0072" for policy details. Use the
cmake_policy command to set the policy and suppress this warning.

  FindOpenGL found both a legacy GL library:

    OPENGL_gl_LIBRARY: /usr/lib/x86_64-linux-gnu/libGL.so

  and GLVND libraries for OpenGL and GLX:

    OPENGL_opengl_LIBRARY: /usr/lib/x86_64-linux-gnu/libOpenGL.so
    OPENGL_glx_LIBRARY: /usr/lib/x86_64-linux-gnu/libGLX.so

  OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
  compatibility with CMake 3.10 and below the legacy GL library will be used.
Call Stack (most recent call first):
  CMakeLists.txt:25 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- pybind11 v2.6.0 dev
CMake Error at /home/izkula/anaconda/envs/ig/lib/python3.7/site-packages/cmake/data/share/cmake-3.18/Modules/FindCUDA.cmake:716 (message):
  Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
  CMakeLists.txt:45 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/izkula/src/iGibson/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/izkula/src/iGibson/setup.py", line 151, in <module>
    include_package_data=True,
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/site-packages/setuptools/__init__.py", line 163, in setup
    return distutils.core.setup(**attrs)
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/site-packages/setuptools/command/develop.py", line 38, in run
    self.install_for_development()
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/site-packages/setuptools/command/develop.py", line 140, in install_for_development
    self.run_command('build_ext')
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/izkula/src/iGibson/setup.py", line 57, in run
    self.build_extension(ext)
  File "/home/izkula/src/iGibson/setup.py", line 96, in build_extension
    subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env)
  File "/home/izkula/anaconda/envs/ig/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/izkula/src/iGibson/gibson2/render', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/izkula/src/iGibson/gibson2/render/mesh_renderer', '-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=/home/izkula/src/iGibson/gibson2/render/mesh_renderer/build', '-DPYTHON_EXECUTABLE=/home/izkula/anaconda/envs/ig/bin/python3.7', '-DMAC_PLATFORM=FALSE', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
----------------------------------------

ERROR: Command errored out with exit status 1: /home/izkula/anaconda/envs/ig/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/izkula/src/iGibson/setup.py'"'"'; file='"'"'/home/izkula/src/iGibson/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

for pip install gibson2 I get a super long error that ends with:
subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-22nsf75n/gibson2_e5ae14d1705941a995a017df0bd5f9b9/gibson2/render', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-install-22nsf75n/gibson2_e5ae14d1705941a995a017df0bd5f9b9/build/lib.linux-x86_64-3.7/gibson2/render/mesh_renderer', '-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=/tmp/pip-install-22nsf75n/gibson2_e5ae14d1705941a995a017df0bd5f9b9/build/lib.linux-x86_64-3.7/gibson2/render/mesh_renderer/build', '-DPYTHON_EXECUTABLE=/home/izkula/anaconda/envs/ig/bin/python3.7', '-DMAC_PLATFORM=FALSE', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /home/izkula/anaconda/envs/ig/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-22nsf75n/gibson2_e5ae14d1705941a995a017df0bd5f9b9/setup.py'"'"'; file='"'"'/tmp/pip-install-22nsf75n/gibson2_e5ae14d1705941a995a017df0bd5f9b9/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_l0pmb8k/install-record.txt --single-version-externally-managed --compile --install-headers /home/izkula/anaconda/envs/ig/include/python3.7m/gibson2 Check the logs for full command output.

cmake --version yields
cmake version 3.18.4

@izkula
Copy link
Author

izkula commented Dec 17, 2020

Okay, I solved the docker issue by installing the nvidia container toolkit (nvidia-docker2)
Following this: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
----> It works (for this example)
Now to try iGibson
sudo ./docker/headless-gui/run.sh

It would still be great to figure out why the other install methods don't work here, though

@fxia22
Copy link
Collaborator

fxia22 commented Dec 17, 2020

For the build error, looks like CUDA_TOOLKIT_ROOT_DIR is not set, can you set it to where your cuda is located?

something like export CUDA_TOOLKIT_ROOT_DIR=/usr/loca/cuda-<version>

you can also configure the build not to use cuda (this way rendering to tensor is not available, but you can still use most of iGibson features), by changing this line to FALSE:

https://github.com/StanfordVL/iGibson/blob/master/gibson2/render/CMakeLists.txt#L11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants