[Discussion] Status CUDA Support for tensorflow image #516

pascalwhoop · 2017-12-13T16:59:01Z

Hi all,
As far as I understood from this post, it's a legal issue if you can roll out NVIDIA CUDA with the images or not.

is that true?
could we extend the docs with a link to an external post (to be written?) that shows how to port this Dockerimage to one that supplies CUDA support? I am currently working on this, but I am not sure yet if I will succeed. Have others used it together? What were your experiences?

parente · 2017-12-17T04:15:54Z

@pascalwhoop I'm not familiar with the licensing and distributions requirements for CUDA. Maybe @jakirkham can chime in.

could we extend the docs with a link to an external post (to be written?) that shows how to port this Dockerimage to one that supplies CUDA support?

The recipes wiki page (https://github.com/jupyter/docker-stacks/wiki/Docker-recipes) sounds like the right place to put a link.

jamestwebber · 2017-12-31T03:38:23Z

Adding a 👍 here to register interest in this feature, and happy to help debug if I can. A GPU-enabled TF image would be super useful.

pascalwhoop · 2018-01-16T23:15:05Z

Okay I made some progress today. I built a Dockerfile based on the nvidia cuda image and then added everything else myself. I had issues with the tensorflow image hosted on gcr.io, because my Python3 kernels actually used python 2 underneath (which is a big nono obviously)

The image contains:

jupyter of course
python 2+3, all packages are in 3 though
TensorFlow
OpenAI Gym
Roboschool
bullet (to avoid MujoCo proprietary stuff)
ffmpeg
X server + VNC ability

It's big (5GB) but I guess it has a lot packing in it.

The Python libs are the same as the datascience notebook, I left all R stuff out though. I think that can be a separate notebook

The repo with my built Dockerfile is this one and I would love some feedback (@jamestwebber ? :-) ). If it pleases the community we can think about the best place to put it. I am unsure if it is a "Jupyter" or a "TensorFlow" or even a "AI research" Image... so under which organization to place it remains to be seen.

I am currently building the image on a google cloud VM and I will push it to the docker hub so people can check it out and see if it works for them. Building it yourself requires you to DL the cudnn.so which you can only get with an nvidia account (yuck) but we should digress and swallow the proprietary pill for now...

parente · 2018-02-07T02:16:49Z

Cross-posting from the PR so it's retained here on the original issue:

As a matter of principal, we (the Project Jupyter maintainers) do not wish to deal with the distribution of non-open source software. This sentiment includes both the binary Docker images and the toolchain we use to build them.

We will be happy to revisit this issue if the CUDA license changes. Until then, we're going to close this issue. Users who wish to include CUDA in their Docker images will need to accept the license agreement and make their own builds.

pascalwhoop · 2018-02-07T05:46:12Z

Good mentality, although I don't believe Nvidia will ever OS the cuda libs for the sake of generating nice revenue streams. Too bad.

david-waterworth · 2018-10-08T20:44:15Z

If you just want tensorflow-gpu to work, this this Dockerfile works for me

FROM jupyter/scipy-notebook

USER root

RUN conda install --quiet --yes \
    'tensorflow-gpu' \
    'keras' && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

USER $UID

The I start using docker-compose

version: '2.3'

services:
  tensorflow-gpu:
    image: tensorflow-gpu:latest
    restart: always
    runtime: nvidia
    ports:
      - 8888:8888
    volumes: 
      - "~/machine-learning/notebooks:/home/jovyan/notebooks"

This requires nvidia-docker2 (the nvidia docker runtime referred to in the docker-compose file above). It has to be installed on the host machine, along with cuda 10.0 (and you need to take care to install a supported nvidia driver, I'm using 410.48, I downloaded and installed the deb file from nvidia).

aboettcher · 2018-10-30T12:56:11Z

For me the approach by @david-waterworth did not work out: when I tried to import tensorflow there were errors about missing libraries. Any help to get this minimal setup working is appreciated. However, I got tensorflow-gpu running with significantly more work as described below.
Installing the missing libraries basically boils down to copying what the nvidia containers are doing (see cuda10 section and the linked Dockerfiles at dockerhub). I added the package installs from the nvidia docker images to a dockerfile that inherits from jupyter/scipy-notebook and then install tensorflow. Since the default packaged tensorflow is built against cuda9 I had to use a custom compiled version found here but now ended up building it in the container because I need support for different graphic cards. I also had to update the numpy version since the tensorflow version was build against a newer numpy version than the one available in jupyter/scipy-notebook.
The result (plus some other things) is available on github, as well as a short description of how to setup a machine to run the image.

david-waterworth · 2018-10-31T02:03:47Z

@aboettcher there's 3 pain points I've encountered trying to get nvidia-docker to work.

Install the latest nvidia driver directly from nvidia (410.48) instead of using the distro's packaged version and test by running nvidia-smi from a bash prompt (it should display the device capabilities).
Install cuda 10.0 by downloading the runfile installer. Select the optional samples and I generally either change the samples install location to my home folder, or copy after installation so I can build without root but this isn't required. Build and run at least the deviceQuery sample to check that cuda is installed and working correctly on the host. I do the following to ensure that the lib64 and bin folders can be found

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

sudo bash -c "sudo cat > /etc/profile.d/cuda.conf <<'EOL'
PATH=$PATH:/usr/local/cuda-10.0/bin
export PATH
EOL"

If you don't use the latest CUDA and driver, you cannot run arbitrary versions of cuda in the containers, some versions work, some don't.

If using docker-compose you have to install a version which contains the nvidia runtime. You also have to remember to pass runtime=nvidia to the container.

Not sure if any of this helps, the fact that you got something working implies you have something installed/working on the host machine but my experience is the above process seems the most general. In particular test different versions of the nvidia-docker images i.e.

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:9.2-base nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi

This will show if you have the correct dependencies on the host

y1zhou · 2018-11-21T05:19:17Z

@david-waterworth May I ask how you got it to work? I was following your Dockerfile and got my Jupyter Notebook up and running, but

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

returned nothing, which I'm assuming the GPUs aren't detected? docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi does return our list of GPUs. I went through the official Dockerfile from tensorflow and it seems like they are using CUDA 9.0, so could that be the problem here? Thanks!

parente added the type:Question A question about the use of the docker stack images label Dec 17, 2017

parente mentioned this issue Jan 14, 2018

Add PyTorch notebook #533

Closed

clkao mentioned this issue Jan 28, 2018

optional GPU builds #540

Closed

parente closed this as completed Feb 7, 2018

parente mentioned this issue Apr 9, 2018

GPU #607

Closed

iyanmv mentioned this issue Aug 26, 2018

GPU recipe? #706

Closed

parente mentioned this issue Nov 3, 2018

DockerFile for PyTorch? #745

Closed

parente mentioned this issue Feb 24, 2020

feature request: GPU-Jupyter, alternative to (not working) #1001 (ROOT_CONTAINER) #1023

Closed

parente mentioned this issue Dec 7, 2020

Add nvidia based notebooks to stack #1196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Status CUDA Support for tensorflow image #516

[Discussion] Status CUDA Support for tensorflow image #516

pascalwhoop commented Dec 13, 2017

parente commented Dec 17, 2017

jamestwebber commented Dec 31, 2017

pascalwhoop commented Jan 16, 2018 •

edited

Loading

parente commented Feb 7, 2018

pascalwhoop commented Feb 7, 2018

david-waterworth commented Oct 8, 2018 •

edited

Loading

aboettcher commented Oct 30, 2018

david-waterworth commented Oct 31, 2018 •

edited

Loading

y1zhou commented Nov 21, 2018

[Discussion] Status CUDA Support for tensorflow image #516

[Discussion] Status CUDA Support for tensorflow image #516

Comments

pascalwhoop commented Dec 13, 2017

parente commented Dec 17, 2017

jamestwebber commented Dec 31, 2017

pascalwhoop commented Jan 16, 2018 • edited Loading

parente commented Feb 7, 2018

pascalwhoop commented Feb 7, 2018

david-waterworth commented Oct 8, 2018 • edited Loading

aboettcher commented Oct 30, 2018

david-waterworth commented Oct 31, 2018 • edited Loading

y1zhou commented Nov 21, 2018

pascalwhoop commented Jan 16, 2018 •

edited

Loading

david-waterworth commented Oct 8, 2018 •

edited

Loading

david-waterworth commented Oct 31, 2018 •

edited

Loading