Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--gpu option questions / nvidia #127

Closed
sjordahl opened this issue Feb 14, 2019 · 32 comments
Closed

--gpu option questions / nvidia #127

sjordahl opened this issue Feb 14, 2019 · 32 comments

Comments

@sjordahl
Copy link

Per https://github.com/mviereck/x11docker/wiki/X-server-and-Wayland-Options, there are a number of X server options that can take the --gpu parameter, yet when I add it to any of them they all fallback to hostdisplay. What's the real deal? Between the x11docker-gui and the page linked above, the X servers I should be able to use --gpu for are the following (excluding wayland options for this):

  • xpra-xwayland
  • weston-xwayland
  • xwayland

Please advise.

@mviereck
Copy link
Owner

mviereck commented Feb 14, 2019

For options xpra-xwayland, weston-xwayland and xwayland you need dependencies xpra weston Xwayland and xdotool. (Some of them less, more details on the wiki page.)

If the dependencies are not fulfilled, x11docker falls back to --hostdisplay.
Other way around, if the dependencies are fulfilled and you only provide --gpu (and maybe --desktop, too) without specifying an X server option, x11docker will automatically choose one of them.

Or do you have an NVIDIA card with proprietary driver? In that case x11docker will use --hostdisplay because NVIDIA does not support Wayland. Though, x11docker should show a warning in that case.

@sjordahl
Copy link
Author

Thanks for the quick reply @mviereck.

I'm running OpenSUSE Leap 15, which uses Wayland and Xwayland natively. I have xpra, weston, Xwayland and xdotool all installed, and yes, I am using the NVIDIA proprietary driver. Based on your comment, I'm curious how my host system is working with the NVIDIA proprietary driver if it doesn't support Wayland. Or is it operating at the Xwayland layer?

But, it appears the answer to my question is that x11docker is coded to fallback to hostdisplay if the NVIDIA proprietary drivers are in play, correct?

I've just got nvidia-docker working and was hoping to be able to make it more secure. If it's not possible, that's fine too. Just another tool in the toolbox.

Thanks for any insight you can provide. And thanks for developing x11docker. It's a nice way of securing Docker making sense of the huge amount of options available!

@mviereck
Copy link
Owner

mviereck commented Feb 14, 2019

But, it appears the answer to my question is that x11docker is coded to fallback to hostdisplay if the NVIDIA proprietary drivers are in play, correct?

Yes. I am a bit surprised that you can use Wayland with a proprietary NVIDIA driver. I've heard they would work on it, but did not realized that there has been some progress.

I've uploaded a change in master branch that does not fall back to --hostdisplay with proprietary NVIDIA drivers but shows a warning only. Please try out. I cannot check myself because I don't have NVIDIA hardware at all.

Or is it operating at the Xwayland layer?

If Wayland does not work, Gnome falls back to Xorg. Xwayland only runs within Wayland.
You can check WAYLAND_DISPLAY on host. If it has content like wayland-0 your desktop runs on Wayland. You should find the Wayland socket in XDG_RUNTIME_DIR.

I've just got nvidia-docker working and was hoping to be able to make it more secure.

I am not sure if it will work ootb with x11docker. But we can figure it out. Option --cap-default might help in first test runs. Other than that, x11docker has its own ways to provide NVIDIA drivers in containers, you'll see instructions in the terminal output or can look at https://github.com/mviereck/x11docker/wiki/NVIDIA-driver-support-for-docker-container.

Thanks for any insight you can provide. And thanks for developing x11docker. It's a nice way of securing Docker making sense of the huge amount of options available!

:) Thank you.
If you have any insight questions, feel free to ask.

@sjordahl
Copy link
Author

Thank you for making the change. No joy though. I get the following mostly:

X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  147 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  26
  Current serial number in output stream:  27

Came across this docker-vmd. Haven't tried it yet. Know anything about it?

@mviereck
Copy link
Owner

mviereck commented Feb 15, 2019

Came across this docker-vmd. Haven't tried it yet. Know anything about it?

Sorry, I don't know anything special about it. Looking into the Dockerfile, it has a hard coded NVIDIA driver version. x11docker can be more flexible.

No joy though. I get the following mostly:

Did you understand and apply the instructions in terminal output and wiki? It is a bit special. You can either provide a matching driver installer on host or install a matching driver yourself in image.

If you tried with nvidia-docker image, than the errors might be related to a quite special setup within the image. I once had a closer look at it and its wiki but found it quite confusing. I did not investigate further. x11docker doesn't need its special setup and can provide/run nvidia drivers in simple, non-special images.

Please show me ~/.cache/x11docker/x11docker.log on www.pastebin.com after terminating the container.

Edit:
If you want to try with nvidia-docker image, something like this might work:

x11docker --gpu -- --runtime=nvidia -- nvidia/cuda:9.0-base nvidia-smi
x11docker --gpu --cap-default -- --runtime=nvidia -- nvidia/cuda:9.0-base nvidia-smi
x11docker --gpu --cap-default --user=root -- --runtime=nvidia -- nvidia/cuda:9.0-base nvidia-smi

@mviereck mviereck changed the title --gpu option questions --gpu option questions / nvidia Feb 15, 2019
@sjordahl
Copy link
Author

I understood the instructions. It was my original understanding that one of the goals of the nvidia-docker project was to not have to install as much in the container. But upon further investigation it's clear that I don't have a good understanding of what is actually provided inside the container by the runtime.

I've pushed some interesting results to pastebin. First, I'm running CUDA v10, so I needed to modify the command. Here's what I ran:
x11docker --gpu -- --runtime=nvidia -- nvidia/cuda nvidia-smi
and it failed. Here is the x11docker.log.

Interestingly, I removed the --gpu parameter, and the smi command succeeded. Here's what I ran:
x11docker -- --runtime=nvidia -- nvidia/cuda nvidia-smi
Here's the output:

Fri Feb 15 12:29:30 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.93       Driver Version: 410.93       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M620         On   | 00000000:01:00.0  On |                  N/A |
| N/A   49C    P0    N/A /  N/A |   1781MiB /  1968MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
And here's the [x11docker.log](https://pastebin.com/aqqVitzm).

With regard to your comments about Wayland, I now believe I misinterpreted some things I came across previously. OpenSUSE DOES support Wayland, but to your point, not in my configuration. There are Wayland configs for both Gnome and KDE, but not with NVIDIA. Thank you for the correction.

I'm going to try a couple other things with respect to the proper NVIDIA driver, and I'll let you know how it goes.

@mviereck
Copy link
Owner

mviereck commented Feb 15, 2019

OpenSUSE DOES support Wayland, but to your point, not in my configuration. There are Wayland configs for both Gnome and KDE, but not with NVIDIA.

Thanks for checking that out. I've changed master branch accordingly, x11docker will fall back to --hostdisplay again. Looking at the check caused some improvement in the code, this was a good side effect.

Here's what I ran:
x11docker --gpu -- --runtime=nvidia -- nvidia/cuda nvidia-smi
and it failed.

It failed because weston that is used in background for automatically chosen --xpra-xwayland failed. Now x11docker falls back to --hostdisplay again and the command should work.


Interestingly, I removed the --gpu parameter, and the smi command succeeded. Here's what I ran:
x11docker -- --runtime=nvidia -- nvidia/cuda nvidia-smi

That is surprising. x11docker uses --xpra that uses invisible X server Xvfb in background. Xvfb does not support GPU acceleration. I am curious if regular OpenGL applications like glxgears can use hardware acceleration in this setup although it should not be supported. Could you try that, please?

I'm going to try a couple other things with respect to the proper NVIDIA driver, and I'll let you know how it goes.

That is quite appreciated! It is good to have some feedback about the NVIDIA support of x11docker as I cannot test it myself.

@sjordahl
Copy link
Author

sjordahl commented Feb 16, 2019

I can confirm that both of the following work properly now:

  • x11docker --gpu -- --runtime=nvidia -- nvidia/cuda nvidia-smi
  • x11docker -- --runtime=nvidia -- nvidia/cuda nvidia-smi

Now, on to glxgears. If you haven't read this, it explains pretty nicely. Using the following Dockerfile I created a cuda:glxgears image:

FROM nvidia/cudagl:10.0-runtime

RUN apt-get update && apt-get install -y --no-install-recommends \
        mesa-utils && \
    rm -rf /var/lib/apt/lists/*

This worked properly using the following:

  • x11docker --gpu -- --runtime=nvidia -- cuda:glxgears glxgears
  • x11docker -- --runtime=nvidia -- cuda:glxgears glxgears

For --gpu it used hostdisplay, while without --gpu it used xpra. Here's the x11docker.log for --gpu, and here's x11docker.log without.

For other tests I was doing I put my NVIDIA driver in ~/.local/share/x11docker, and for the --gpu test (hostdisplay) when the terminal output said it was installing the driver is about the same time when the gears window opened. As an aside, on a previous test I left x11docker to install the driver (for about an hour) and it never finished. I'm thinking I was telling it to run a graphical app that it couldn't run, so the container stopped while x11docker was trying to do the driver install -- maybe??

I also did a bit of investigation about the difference between NVIDIA images. For my first test for glxgears I tried using nvidia/cuda, but that didn't work. What worked was when I switched to nvidia/cudagl. Per the blog I linked above, the images use different environment variables:

nvidia/cuda:

root@d99f6744313d:/# env | grep NVIDIA
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_REQUIRE_CUDA=cuda>=10.0 brand=tesla,driver>=384,driver<385

nvidia/cudagl:

root@6e91b0a41e86:/# env | grep NVIDIA
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics,compat32,utility
NVIDIA_REQUIRE_CUDA=cuda>=10.0 brand=tesla,driver>=384,driver<385

So the "graphics" element in the NVIDIA_DRIVER_CAPABILITIES variable makes the difference.

I thought I'd try using an NVIDIA-provided Dockerfile to build an image with the driver. I used this as a starting point. How does it compare to how you install the driver?
When I run it with docker I get this:
/usr/local/bin/nvidia-driver: line 273: /run/nvidia/nvidia-driver.pid: No such file or directory
and when I run it with x11docker I get this:
ERROR (catatonit:90): failed to exec pid1: Permission denied
I only have a /run/nvidia-persistenced/nvidia-persistenced.pid, not and nvidia-driver.pid, so that's why it's erroring out. Not sure how to give it the process id it's wanting. It appears the nvidia-driver script does account for persistenced. Didn't really go much further at that point. For the x11docker instance, the ENTRYPOINT is to nvidia-driver, and since I didn't give it permissions for non-root users, I believe that to be the problem.

@sjordahl
Copy link
Author

This is interesting: Driver-containers-(Beta)

I think the Dockerfile I used in my previous post was an incorrect usage. This is for running the driver system-wide inside a container. But there may still be value in its process for x11docker??

@mviereck
Copy link
Owner

mviereck commented Feb 16, 2019

when the terminal output said it was installing the driver is about the same time when the gears window opened.

There has been a bug. I did some fixes for the automated driver install, it should work now.
The installation should take less than a minute. You can run with --verbose to see some progress.

Still ToDo: x11docker should not install the driver if the same version is already installed in image.
x11docker does not install the nvidia driver in container if it finds a matching version in image.


Your glxgears test without --gpu indicates that it uses software rendering instead of the GPU.
Please try with command sh -c 'glxinfo | grep -i opengl'.


I thought I'd try using an NVIDIA-provided Dockerfile to build an image with the driver. I used this as a starting point. How does it compare to how you install the driver?

There the driver is installed into the image. x11docker installs it in the running container.
The command looks similar:

 sh $Cshare/NVIDIA-$Nvidiaversion.run  \
    --accept-license \
    --no-runlevel-check  \
    --no-questions \
    --ui=none \
    --no-kernel-module \
    --no-kernel-module-source \
    --no-backup

The installer also needs modprobe (kmod) and xz.


I think the Dockerfile I used in my previous post was an incorrect usage. This is for running the driver system-wide inside a container. But there may still be value in its process for x11docker??

I'll look later into this.

@sjordahl
Copy link
Author

Output of x11docker -- --runtime=nvidia -- cuda:glxgears sh -c 'glxinfo | grep -i opengl':

OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 6.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 18.0.5
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

@sjordahl
Copy link
Author

Output of x11docker --gpu -- --runtime=nvidia -- cuda:glxgears sh -c 'glxinfo | grep -i opengl':

OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Quadro M620/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 410.93
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.0 NVIDIA 410.93
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 410.93
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

@mviereck
Copy link
Owner

Thank you for the test!

Without --gpu the OpenGL renderer string shows llvmpipe. That shows that software rendering is at work.

Your test with x11docker --gpu -- --runtime=nvidia -- cuda:glxgears sh -c 'glxinfo | grep -i opengl' obviously shows the NVIDIA card.
Success!

@mviereck
Copy link
Owner

mviereck commented Feb 17, 2019

I thought of the possibility to have NVIDIA driver in one image and proving it to containers. There might be possibilities for a setup with LD_PRELOAD. But it might lead to unforeseen issues und unreliability. I won't set up or recommend this.

Basically there a two ways that make sense.

  • End users can create a base image with NVIDIA driver. Other images base upon that. That is probably the best choice in performance and disk space usage for end users that want to set up several containerized applications for own use. Drawback: Every update of the NVIDIA driver needs a rebuild of all images.
  • Developers who want to deploy images with OpenGL applications cannot provide images for all NVIDIA driver versions. But they can document the possibility to provide a matching NVIDIA driver in ~/.local/share/x11docker. This slows down every container startup a bit, but is flexible and portable and does not need specific image setups (beside the dependencies on xz and modprobe in image).

On automated install x11docker checks nvidia-settings -v. It is an assumption that nvidia-settings is present for every nvidia driver installation. I am not sure about that.

I've updated the wiki for x11docker with NVIDIA with some findings of this thread, especially about --runtime=nvidia.
I've also extended the wiki table of X servers about supported X server options with proprietary NVIDIA driver.

Are any questions left? Does everything work as expected now? Release 5.4.2 contains fixes for the issues in this ticket. If everything is ok, we can close this ticket.

@mviereck
Copy link
Owner

mviereck commented Feb 18, 2019

A script to create an NVIDIA base image matching the driver on host:

#! /bin/bash

# Script to build image x11docker/nvidia-base
# containing NVIDIA driver version matching the one on host.

Imagename="x11docker/nvidia-base"

Nvidiaversion="$(head -n1 </proc/driver/nvidia/version | awk '{ print $8 }')"
[ "$Nvidiaversion" ] || {
  echo "Error: No NVIDIA driver detected on host" >&2
  exit 1
}
echo "Detected NVIDIA driver version: $Nvidiaversion"

Driverurl="https://http.download.nvidia.com/XFree86/Linux-x86_64/$Nvidiaversion/NVIDIA-Linux-x86_64-$Nvidiaversion.run"
echo "Driver download URL: $Driverurl"

Tmpdir="/tmp/x11docker-nvidia-base"
mkdir -p "$Tmpdir"

echo "# Dockerfile to create NVIDIA driver base image $Imagename
FROM debian:stable
RUN apt-get update && \
    apt-get install --no-install-recommends -y kmod xz-utils wget ca-certificates && \
    wget $Driverurl -O /tmp/NVIDIA-installer.run && \
    sh /tmp/NVIDIA-installer.run \
        --accept-license --no-runlevel-check --no-questions --no-backup --ui=none \
        --no-kernel-module --no-kernel-module-source --no-nouveau-check --no-nvidia-modprobe && \
    rm /tmp/NVIDIA-installer.run && \
    apt-get remove -y kmod xz-utils wget ca-certificates && \
    apt-get autoremove -y
" >"$Tmpdir/Dockerfile"

echo "Creating docker image $Imagename"
docker build -t $Imagename $Tmpdir || {
  echo "Error: Failed to build image $Imagename.
  Make sure that you have permission to start docker.
  Make sure docker daemon is running." >&2
  rm -R "$Tmpdir"
  exit 1
}

echo "Successfully created $Imagename with driver version $Nvidiaversion"
rm -R "$Tmpdir"
exit 0

@sjordahl
Copy link
Author

I thought of the possibility to have NVIDIA driver in one image and proving it to containers. There might be possibilities for a setup with LD_PRELOAD. But it might lead to unforeseen issues und unreliability. I won't set up or recommend this.

It appears that NVIDIA is trying to get this concept past EXPERIMENTAL. We can let them do the leg work for this and validate it first.

Basically there a two ways that make sense.

  • End users can create a base image with NVIDIA driver. Other images base upon that. That is probably the best choice in performance and disk space usage for end users that want to set up several containerized applications for own use. Drawback: Every update of the NVIDIA driver needs a rebuild of all images.
  • Developers who want to deploy images with OpenGL applications cannot provide images for all NVIDIA driver versions. But they can document the possibility to provide a matching NVIDIA driver in ~/.local/share/x11docker. This slows down every container startup a bit, but is flexible and portable and does not need specific image setups (beside the dependencies on xz and modprobe in image).

With x11docker by itself, you're correct. But with --gpu and --runtime=nvidia the driver install isn't necessary at all if using the nvidia/cudagl base (or the equivalent packages; Dockerfiles found here). They seem to have a way of enabling OpenGL without a version matched driver.

Would you mind disabling the fallback to hostdisplay again so I can test one more time? I think when I tested before I was using the wrong base image (nvidia/cuda instead of nvidia/cudagl).

On automated install x11docker checks nvidia-settings -v. It is an assumption that nvidia-settings ist present for every nvidia driver installation. I am not sure about that.

I believe you are correct that the NVIDIA driver install also installs nvidia-settings.

I've updated the wiki for x11docker with NVIDIA with some findings of this thread, especially about --runtime=nvidia.
I've also extended the wiki table of X servers about supported X server options with proprietary NVIDIA driver.

I'd like to validate these, and hence the request to disable hostdisplay fallback.

On a separate but related topic, I'm writing a 3-blog series on graphical desktop Docker containers. One of them very focused on x11docker. Would you be interested in proofing them before I release them?

@eine
Copy link
Contributor

eine commented Feb 20, 2019

On a separate but related topic, I'm writing a 3-blog series on graphical desktop Docker containers. One of them very focused on x11docker. Would you be interested in proofing them before I release them?

I'd be so interested on those blog posts, either before or after publishing them. I've been helping test x11docker on Win10 (see mviereck/x11docker#commenter:1138-4EB). Recently, I commented with @mviereck that some friendly introduction for new users would be great. You can find some notes here.

@mviereck
Copy link
Owner

mviereck commented Feb 20, 2019

With x11docker by itself, you're correct. But with --gpu and --runtime=nvidia the driver install isn't necessary at all if using the nvidia/cudagl base (or the equivalent packages; Dockerfiles found here). They seem to have a way of enabling OpenGL without a version matched driver.

IIRC nvidia-docker somehow uses driver files from host and provides them to the container. However, the nvidia-docker images work only (iirc) with nvidia cards and fail on computers with e.g. an AMD or Intel GPU. I could re-check that.

Would you mind disabling the fallback to hostdisplay again so I can test one more time? I think when I tested before I was using the wrong base image (nvidia/cuda instead of nvidia/cudagl).

I don't want to change it in master branch again because some users use it regulary or for the first time.
But I've uploaded a version with disabled check in experimental branch: https://raw.githubusercontent.com/mviereck/x11docker/experimental/x11docker
However, you can disable the check yourself at https://github.com/mviereck/x11docker/blob/master/x11docker#L1913 , just comment out Return=1:

  [ "$Hostnvidia" = "yes" ] && case ${1:-} in
    --xpra-xwayland|--weston-xwayland|--xwayland|--weston|--kwin|--kwin-xwayland|--xdummy-xwayland|--hostwayland)
      $Message "${1:-}: Closed source NVIDIA driver does not support Wayland."
#      Return=1
    ;;
  esac

A quite simple check: Run weston in a terminal. It probably fails.

On a separate but related topic, I'm writing a 3-blog series on graphical desktop Docker containers. One of them very focused on x11docker. Would you be interested in proofing them before I release them?

Yes, I like to look at it. It's a honour for me to see x11docker in a blog post.

This slows down every container startup a bit, but is flexible and portable and does not need specific image setups (beside the dependencies on xz and modprobe in image).

Meanwhile I could circumvent the dependencies on xz and kmod for automated driver installation.

I currently run some tests with old driver versions, maybe some things still have to be fixed.
Can you assess which driver versions are still of interest recently, e.g. for old hardware?

Edit: Does x11docker skip installation of nvidia driver if you run a cuda image with --runtime=nvidia? I.e. does the version check with nvidia-settings -v in container work in that case, too?

mviereck added a commit that referenced this issue Feb 21, 2019
@sjordahl
Copy link
Author

With x11docker by itself, you're correct. But with --gpu and --runtime=nvidia the driver install isn't necessary at all if using the nvidia/cudagl base (or the equivalent packages; Dockerfiles found here). They seem to have a way of enabling OpenGL without a version matched driver.

IIRC nvidia-docker somehow uses driver files from host and provides them to the container. However, the nvidia-docker images work only (iirc) with nvidia cards and fail on computers with e.g. an AMD or Intel GPU. I could re-check that.

You're correct that nvidia-docker only works with nvidia cards. The way I look at it, if someone is using the --runtime=nvidia Docker option they should have an idea of what they're doing -- and there's a very good chance they're using CUDA. What do you think of issuing a warning and NOT installing the nvidia driver (assuming it exists in the .local dir) and --runtime=nvidia is provided? In my case, where I have both runtimes, when I use --runtime=nvidia I'm accounting for the components to make it work in the image and don't want the driver installed.

Would you mind disabling the fallback to hostdisplay again so I can test one more time? I think when I tested before I was using the wrong base image (nvidia/cuda instead of nvidia/cudagl).

I don't want to change it in master branch again because some users use it regulary or for the first time.
But I've uploaded a version with disabled check in experimental branch: https://raw.githubusercontent.com/mviereck/x11docker/experimental/x11docker
However, you can disable the check yourself at https://github.com/mviereck/x11docker/blob/master/x11docker#L1913 , just comment out Return=1:

  [ "$Hostnvidia" = "yes" ] && case ${1:-} in
    --xpra-xwayland|--weston-xwayland|--xwayland|--weston|--kwin|--kwin-xwayland|--xdummy-xwayland|--hostwayland)
      $Message "${1:-}: Closed source NVIDIA driver does not support Wayland."
#      Return=1
    ;;
  esac

Perfect. I commented out the Return=1 to play with it. So far, I'm still only able to use OpenGL via the nvidia runtime with --hostdisplay and --xorg. I'm curious why it doesn't work in --xpra-xwayland. I'll preface the following by saying I know very little about these technologies as I haven't been able to spend much time on either. But in looking at x11docker.log it seems to me that hardware accelerated OpenGL IS available and could be used, but for some reason it's falling back to software. Thoughts?

As part of this process I began comparing the Docker inspect output for the following two configurations:

  • docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix cuda:glxgears xterm
  • x11docker --hostdisplay --gpu -- --runtime=nvidia -- cuda:glxgears xterm

Both produce hardware accelerated containers in which I'm able to use glxgears, and I didn't let the driver get installed in either. Would you have any interest in comparing? Just curious.

A quite simple check: Run weston in a terminal. It probably fails.

On a separate but related topic, I'm writing a 3-blog series on graphical desktop Docker containers. One of them very focused on x11docker. Would you be interested in proofing them before I release them?

Yes, I like to look at it. It's a honour for me to see x11docker in a blog post.

I'm getting this set up so I can share. Not sure how best to private message the info to you when available.

This slows down every container startup a bit, but is flexible and portable and does not need specific image setups (beside the dependencies on xz and modprobe in image).

Meanwhile I could circumvent the dependencies on xz and kmod for automated driver installation.

I currently run some tests with old driver versions, maybe some things still have to be fixed.
Can you assess which driver versions are still of interest recently, e.g. for old hardware?

Edit: Does x11docker skip installation of nvidia driver if you run a cuda image with --runtime=nvidia? I.e. does the version check with nvidia-settings -v in container work in that case, too?

No, x11docker does not skip installation. Based on the test that I just ran to answer your edit, nvidia-settings does not exist natively in the nvidia/cudagl image, and does get installed if I let x11docker install the driver.

mviereck added a commit that referenced this issue Feb 22, 2019
@mviereck
Copy link
Owner

mviereck commented Feb 22, 2019

I've included a check for --runtime=nvidia. Driver installation is skipped in that case.

So far, I'm still only able to use OpenGL via the nvidia runtime with --hostdisplay and --xorg. I'm curious why it doesn't work in --xpra-xwayland. I

Basically it is just the missing Wayland support by NVIDIA. You can check with weston that is used in background by --xpra-xwayland. As long as weston fails, some X server options are just not possible.
Looking at the x11docker.log you provided, I am surprised, Weston did not crash as it did previously / about a year ago with nvidia driver. Maybe there has been some progress meanwhile. However, Xwayland still seems to have an issue:

Disabling glamor and dri3, EGL setup failed
Failed to initialize glamor, falling back to sw

As part of this process I began comparing the Docker inspect output for the following two configurations:
docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix cuda:glxgears xterm
x11docker --hostdisplay --gpu -- --runtime=nvidia -- cuda:glxgears xterm
Both produce hardware accelerated containers in which I'm able to use glxgears, and I didn't let the driver get installed in either. Would you have any interest in comparing? Just curious.

Your first example cannot have hardware acceleration from what I can see. It does not even have access to the GPU. Better check with sh -c 'glxinfo | grep OpenGL'.

I'm getting this set up so I can share. Not sure how best to private message the info to you when available.

Tell me when you are ready, I'll somehow give you an email adress.

@sjordahl
Copy link
Author

As part of this process I began comparing the Docker inspect output for the following two configurations:
docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix cuda:glxgears xterm
x11docker --hostdisplay --gpu -- --runtime=nvidia -- cuda:glxgears xterm
Both produce hardware accelerated containers in which I'm able to use glxgears, and I didn't let the driver get installed in either. Would you have any interest in comparing? Just curious.

Your first example cannot have hardware acceleration from what I can see. It does not even have access to the GPU. Better check with sh -c 'glxinfo | grep OpenGL'.

Sorry, meant:

  • docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --runtime=nvidia cuda:glxgears xterm

@mviereck
Copy link
Owner

mviereck commented Feb 22, 2019

docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --runtime=nvidia cuda:glxgears xterm

Ok, that probably works. --runtime=nvidia somehow allows GPU access like x11docker does with --gpu.
However, if you run e.g. x11docker --xpra -- --runtime=nvidia -- cuda:glxgears xterm, hardware acceleration fails because Xvfb behind xpra does not support it.

@sjordahl
Copy link
Author

I found the difference in what is being passed through via Docker to do the same job to be interesting.

@mviereck
Copy link
Owner

mviereck commented Feb 22, 2019

I found the difference in what is being passed through via Docker to do the same job to be interesting.

That is interesting, but I don't see a possible use case, so I would not spend time for investigating myself.
x11docker's --gpu just shares /dev/dri, dev/nvidia* and /dev/vga_arbiter.
The nvidia-docker setup does a lot of hickhack to serve the closed source policy of NVIDIA corporation and to circumvent the mismatch to the MESA standards.

The free MESA drivers are just compatible between host and container in all versions and across all systems. If they would be as complicated as NVIDIA, I would not provide GPU support in x11docker at all.

I've included a check for --runtime=nvidia. Driver installation is skipped in that case.

It should work now on your system. You can provide the installer file in ~/.local/share/x11docker. The driver will only be installed if you set --gpu but not --runtime=nvidia.

@sjordahl
Copy link
Author

Is there a possibility to find out the driver version within a cuda container?

Sure. Let me know if you would like anything else.

x11docker --verbose --hostdisplay --gpu -- --runtime=nvidia -- cuda:glxgears xterm

jordahl@7796bd01c3cd:~$ glxinfo | grep -i opengl
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Quadro M620/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 410.93
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.0 NVIDIA 410.93
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 410.93
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

410.93 is the driver version on my host.

@mviereck
Copy link
Owner

mviereck commented Feb 24, 2019

x11docker cannot check the output of glxinfo because it is not present in nvidia-docker images and thus no general solution.

Probably x11docker should just stay with the current check for --runtime=nvidia. It seems to be the easiest solution so far.

@mviereck
Copy link
Owner

Latest release 5.4.4 contains adjustments and fixes related to this thread. The wiki entry about NVIDIA driver with docker got some updates. I think we can close this ticket now.
Thanks for testing things out!

@sjordahl
Copy link
Author

sjordahl commented Feb 28, 2019 via email

@mviereck
Copy link
Owner

mviereck commented Mar 1, 2019

Please have a look at your previous post.

@sjordahl
Copy link
Author

sjordahl commented Mar 2, 2019 via email

@sjordahl
Copy link
Author

sjordahl commented Mar 7, 2019 via email

@sjordahl
Copy link
Author

sjordahl commented Mar 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants