-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nvidia based notebooks to stack #1196
Changes from 6 commits
e0a6fd4
d3e24a9
3a96017
a78ce44
449332a
feee787
2520cb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Documentation | ||
README.md | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Copyright (c) Jupyter Development Team. | ||
# Distributed under the terms of the Modified BSD License. | ||
ARG BASE_CONTAINER=jupyter/scipy-notebook | ||
FROM $BASE_CONTAINER | ||
|
||
LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>" | ||
|
||
ARG CUDA=11.1 | ||
ARG CUDNN=8.0.4.30-1 | ||
ARG CUDNN_MAJOR_VERSION=8 | ||
ARG LIB_DIR_PREFIX=x86_64 | ||
ARG LIBNVINFER=7.2.1-1 | ||
ARG LIBNVINFER_MAJOR_VERSION=7 | ||
|
||
# Fix DL4006 | ||
SHELL ["/bin/bash", "-o", "pipefail", "-c"] | ||
|
||
USER root | ||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
gnupg2 curl ca-certificates && \ | ||
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could use |
||
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \ | ||
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We use 20.04 for quite a while. And, as I see here, ubuntu2004 repo is available. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason the 18.04 repo is used is because not all the packages seem to be present in the 20.04 repo. I am using the tensorflow Dockerfiles as an example (which use 18.04) for this Dockerfile, and if I remember correctly the following package cannot be found on that repository.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcudnn8_8.0.5.39-1+cuda11.1_amd64.deb exists. If something is missing, then I guess it's ok to use old repos. |
||
apt-get purge --autoremove -y curl \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is nice! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #1199 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But you can delete it there as well 👍 |
||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
ENV CUDA_VERSION 11.1.1 | ||
|
||
# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a | ||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
cuda-cudart-11-1=11.1.74-1 \ | ||
cuda-compat-11-1 \ | ||
&& ln -s cuda-11.1 /usr/local/cuda && \ | ||
Comment on lines
+32
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's create an environment variable for the versions here or use the existing ones. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it then not also be a good idea to use environment variables for the tensorflow version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think not, because it is not duplicated many times. |
||
rm -rf /var/lib/apt/lists/* | ||
|
||
# Required for nvidia-docker v1 | ||
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \ | ||
echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf | ||
|
||
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH} | ||
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64 | ||
|
||
# nvidia-container-runtime | ||
ENV NVIDIA_VISIBLE_DEVICES all | ||
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility | ||
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.1 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451" | ||
|
||
# Install all OS dependencies for notebook server that starts but lacks all | ||
# features (e.g., download as all possible file formats) | ||
ENV DEBIAN_FRONTEND noninteractive | ||
RUN apt-get update \ | ||
&& apt-get install -yq --no-install-recommends \ | ||
cuda-command-line-tools-${CUDA/./-} \ | ||
libcublas-${CUDA/./-} \ | ||
cuda-nvrtc-${CUDA/./-} \ | ||
libcufft-${CUDA/./-} \ | ||
libcurand-${CUDA/./-} \ | ||
libcusolver-${CUDA/./-} \ | ||
libcusparse-${CUDA/./-} \ | ||
curl \ | ||
libcudnn8=${CUDNN}+cuda${CUDA} \ | ||
libfreetype6-dev \ | ||
libhdf5-serial-dev \ | ||
libzmq3-dev \ | ||
pkg-config \ | ||
software-properties-common \ | ||
cm-super \ | ||
libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \ | ||
libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \ | ||
&& apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
USER $NB_UID | ||
|
||
WORKDIR $HOME | ||
|
||
# Install Tensorflow | ||
RUN pip install --quiet --no-cache-dir \ | ||
'tensorflow-gpu==2.3.1' && \ | ||
fix-permissions "${CONDA_DIR}" && \ | ||
fix-permissions "/home/${NB_USER}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
[![docker pulls](https://img.shields.io/docker/pulls/jupyter/tensorflow-notebook.svg)](https://hub.docker.com/r/jupyter/tensorflow-notebook/) | ||
[![docker stars](https://img.shields.io/docker/stars/jupyter/tensorflow-notebook.svg)](https://hub.docker.com/r/jupyter/tensorflow-notebook/) | ||
[![image metadata](https://images.microbadger.com/badges/image/jupyter/tensorflow-notebook.svg)](https://microbadger.com/images/jupyter/tensorflow-notebook "jupyter/tensorflow-notebook image metadata") | ||
|
||
# Jupyter Notebook Deep Learning Stack | ||
|
||
GitHub Actions in the https://github.com/jupyter/docker-stacks project builds and pushes this image | ||
to Docker Hub. | ||
|
||
Please visit the project documentation site for help using and contributing to this image and | ||
others. | ||
|
||
- [Jupyter Docker Stacks on ReadTheDocs](http://jupyter-docker-stacks.readthedocs.io/en/latest/index.html) | ||
- [Selecting an Image :: Core Stacks :: jupyter/tensorflow-notebook](http://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-tensorflow-notebook) | ||
- [Image Specifics :: Tensorflow](http://jupyter-docker-stacks.readthedocs.io/en/latest/using/specifics.html#tensorflow) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should definitely add, which GPUs are supported. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the main requirement is CUDA and NVIDIA drivers must be installed on the host. As for GPUs, I think anything released in the last 10 years will work. If I look at the cuda-gpus page the lowest entry is the GeForce GT 430. Arguably, I think anybody looking to use tensorflow with GPUs will have a more modern GPU and (hopefully) have spent the time to make a good decision what GPU to get for their ML work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having first link with the comment about host driver is fine for me. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/bin/bash | ||
set -e | ||
|
||
# Apply tags | ||
GIT_SHA_TAG=${GITHUB_SHA:0:12} | ||
docker tag $IMAGE_NAME "$DOCKER_REPO:$GIT_SHA_TAG" | ||
|
||
# Update index | ||
INDEX_ROW="|\`${BUILD_TIMESTAMP}\`|\`jupyter/${IMAGE_SHORT_NAME}:${GIT_SHA_TAG}\`|[Git diff](https://github.com/jupyter/docker-stacks/commit/${GITHUB_SHA})<br />[Dockerfile](https://github.com/jupyter/docker-stacks/blob/${GITHUB_SHA}/${IMAGE_SHORT_NAME}/Dockerfile)<br />[Build manifest](./${IMAGE_SHORT_NAME}-${GIT_SHA_TAG})|" | ||
sed "/|-|/a ${INDEX_ROW}" -i "${WIKI_PATH}/Home.md" | ||
|
||
# Build manifest | ||
MANIFEST_FILE="${WIKI_PATH}/manifests/${IMAGE_SHORT_NAME}-${GIT_SHA_TAG}.md" | ||
mkdir -p $(dirname "$MANIFEST_FILE") | ||
|
||
cat << EOF > "$MANIFEST_FILE" | ||
* Build datetime: ${BUILD_TIMESTAMP} | ||
* Docker image: ${DOCKER_REPO}:${GIT_SHA_TAG} | ||
* Docker image size: $(docker images ${IMAGE_NAME} --format "{{.Size}}") | ||
* Git commit SHA: [${GITHUB_SHA}](https://github.com/jupyter/docker-stacks/commit/${GITHUB_SHA}) | ||
* Git commit message: | ||
\`\`\` | ||
${COMMIT_MSG} | ||
\`\`\` | ||
|
||
## Python Packages | ||
|
||
\`\`\` | ||
$(docker run --rm ${IMAGE_NAME} python --version) | ||
\`\`\` | ||
|
||
\`\`\` | ||
$(docker run --rm ${IMAGE_NAME} conda info) | ||
\`\`\` | ||
|
||
\`\`\` | ||
$(docker run --rm ${IMAGE_NAME} conda list) | ||
\`\`\` | ||
|
||
## Apt Packages | ||
|
||
\`\`\` | ||
$(docker run --rm ${IMAGE_NAME} apt list --installed) | ||
\`\`\` | ||
EOF | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shold probably add some gpu specific info here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about this as well. Probably would be good to have an image tag with the CUDA version as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Copyright (c) Jupyter Development Team. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to check some gpu specific stuff here? As far as I understand, this file is the same as tensorflow-notebook/test/test_tensorflow.py, so it might be worth to check something else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I agree it would be good to test GPU stuff here, I'm not sure it will be possible to test functional stuff. For example, what I noticed when I created a non-functional GPU docker image is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's tricky, yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now, I say we just run nvidia-smi, as you did and check that error code is zero. |
||
# Distributed under the terms of the Modified BSD License. | ||
import logging | ||
|
||
import pytest | ||
|
||
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"name,command", | ||
[ | ||
( | ||
"Hello world", | ||
"import tensorflow as tf;print(tf.constant('Hello, TensorFlow'))", | ||
), | ||
( | ||
"Sum", | ||
"import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))", | ||
), | ||
], | ||
) | ||
def test_tensorflow(container, name, command): | ||
"""Basic tensorflow tests""" | ||
LOGGER.info(f"Testing tensorflow: {name} ...") | ||
c = container.run(tty=True, command=["start.sh", "python", "-c", command]) | ||
rv = c.wait(timeout=30) | ||
assert rv == 0 or rv["StatusCode"] == 0, f"Command {command} failed" | ||
logs = c.logs(stdout=True).decode("utf-8") | ||
LOGGER.debug(logs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
hooks
andtest
dir like in the PR #1201