Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Status CUDA Support for tensorflow image #516

Closed
pascalwhoop opened this issue Dec 13, 2017 · 9 comments
Closed

[Discussion] Status CUDA Support for tensorflow image #516

pascalwhoop opened this issue Dec 13, 2017 · 9 comments
Labels
type:Question A question about the use of the docker stack images

Comments

@pascalwhoop
Copy link

Hi all,
As far as I understood from this post, it's a legal issue if you can roll out NVIDIA CUDA with the images or not.

  1. is that true?
  2. could we extend the docs with a link to an external post (to be written?) that shows how to port this Dockerimage to one that supplies CUDA support? I am currently working on this, but I am not sure yet if I will succeed. Have others used it together? What were your experiences?
@parente parente added the type:Question A question about the use of the docker stack images label Dec 17, 2017
@parente
Copy link
Member

parente commented Dec 17, 2017

@pascalwhoop I'm not familiar with the licensing and distributions requirements for CUDA. Maybe @jakirkham can chime in.

could we extend the docs with a link to an external post (to be written?) that shows how to port this Dockerimage to one that supplies CUDA support?

The recipes wiki page (https://github.com/jupyter/docker-stacks/wiki/Docker-recipes) sounds like the right place to put a link.

@jamestwebber
Copy link

Adding a 👍 here to register interest in this feature, and happy to help debug if I can. A GPU-enabled TF image would be super useful.

@pascalwhoop
Copy link
Author

pascalwhoop commented Jan 16, 2018

Okay I made some progress today. I built a Dockerfile based on the nvidia cuda image and then added everything else myself. I had issues with the tensorflow image hosted on gcr.io, because my Python3 kernels actually used python 2 underneath (which is a big nono obviously)

The image contains:

  • jupyter of course
  • python 2+3, all packages are in 3 though
  • TensorFlow
  • OpenAI Gym
  • Roboschool
  • bullet (to avoid MujoCo proprietary stuff)
  • ffmpeg
  • X server + VNC ability

It's big (5GB) but I guess it has a lot packing in it.

The Python libs are the same as the datascience notebook, I left all R stuff out though. I think that can be a separate notebook

The repo with my built Dockerfile is this one and I would love some feedback (@jamestwebber ? :-) ). If it pleases the community we can think about the best place to put it. I am unsure if it is a "Jupyter" or a "TensorFlow" or even a "AI research" Image... so under which organization to place it remains to be seen.

I am currently building the image on a google cloud VM and I will push it to the docker hub so people can check it out and see if it works for them. Building it yourself requires you to DL the cudnn.so which you can only get with an nvidia account (yuck) but we should digress and swallow the proprietary pill for now...

@parente
Copy link
Member

parente commented Feb 7, 2018

Cross-posting from the PR so it's retained here on the original issue:

As a matter of principal, we (the Project Jupyter maintainers) do not wish to deal with the distribution of non-open source software. This sentiment includes both the binary Docker images and the toolchain we use to build them.

We will be happy to revisit this issue if the CUDA license changes. Until then, we're going to close this issue. Users who wish to include CUDA in their Docker images will need to accept the license agreement and make their own builds.

@parente parente closed this as completed Feb 7, 2018
@pascalwhoop
Copy link
Author

Good mentality, although I don't believe Nvidia will ever OS the cuda libs for the sake of generating nice revenue streams. Too bad.

@parente parente mentioned this issue Apr 9, 2018
Closed
@iyanmv iyanmv mentioned this issue Aug 26, 2018
@david-waterworth
Copy link

david-waterworth commented Oct 8, 2018

If you just want tensorflow-gpu to work, this this Dockerfile works for me

FROM jupyter/scipy-notebook

USER root

RUN conda install --quiet --yes \
    'tensorflow-gpu' \
    'keras' && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

USER $UID

The I start using docker-compose

version: '2.3'

services:
  tensorflow-gpu:
    image: tensorflow-gpu:latest
    restart: always
    runtime: nvidia
    ports:
      - 8888:8888
    volumes: 
      - "~/machine-learning/notebooks:/home/jovyan/notebooks"

This requires nvidia-docker2 (the nvidia docker runtime referred to in the docker-compose file above). It has to be installed on the host machine, along with cuda 10.0 (and you need to take care to install a supported nvidia driver, I'm using 410.48, I downloaded and installed the deb file from nvidia).

@aboettcher
Copy link

For me the approach by @david-waterworth did not work out: when I tried to import tensorflow there were errors about missing libraries. Any help to get this minimal setup working is appreciated. However, I got tensorflow-gpu running with significantly more work as described below.
Installing the missing libraries basically boils down to copying what the nvidia containers are doing (see cuda10 section and the linked Dockerfiles at dockerhub). I added the package installs from the nvidia docker images to a dockerfile that inherits from jupyter/scipy-notebook and then install tensorflow. Since the default packaged tensorflow is built against cuda9 I had to use a custom compiled version found here but now ended up building it in the container because I need support for different graphic cards. I also had to update the numpy version since the tensorflow version was build against a newer numpy version than the one available in jupyter/scipy-notebook.
The result (plus some other things) is available on github, as well as a short description of how to setup a machine to run the image.

@david-waterworth
Copy link

david-waterworth commented Oct 31, 2018

@aboettcher there's 3 pain points I've encountered trying to get nvidia-docker to work.

  1. Install the latest nvidia driver directly from nvidia (410.48) instead of using the distro's packaged version and test by running nvidia-smi from a bash prompt (it should display the device capabilities).

  2. Install cuda 10.0 by downloading the runfile installer. Select the optional samples and I generally either change the samples install location to my home folder, or copy after installation so I can build without root but this isn't required. Build and run at least the deviceQuery sample to check that cuda is installed and working correctly on the host. I do the following to ensure that the lib64 and bin folders can be found

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

sudo bash -c "sudo cat > /etc/profile.d/cuda.conf <<'EOL'
PATH=$PATH:/usr/local/cuda-10.0/bin
export PATH
EOL"

If you don't use the latest CUDA and driver, you cannot run arbitrary versions of cuda in the containers, some versions work, some don't.

  1. If using docker-compose you have to install a version which contains the nvidia runtime. You also have to remember to pass runtime=nvidia to the container.

Not sure if any of this helps, the fact that you got something working implies you have something installed/working on the host machine but my experience is the above process seems the most general. In particular test different versions of the nvidia-docker images i.e.

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:9.2-base nvidia-smi
docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi

This will show if you have the correct dependencies on the host

@y1zhou
Copy link

y1zhou commented Nov 21, 2018

@david-waterworth May I ask how you got it to work? I was following your Dockerfile and got my Jupyter Notebook up and running, but

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

returned nothing, which I'm assuming the GPUs aren't detected? docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi does return our list of GPUs. I went through the official Dockerfile from tensorflow and it seems like they are using CUDA 9.0, so could that be the problem here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Question A question about the use of the docker stack images
Projects
None yet
Development

No branches or pull requests

6 participants