Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size of gcr.io/kubeflow/tensorflow-notebook-* #37

Closed
flx42 opened this issue Dec 18, 2017 · 9 comments
Closed

Size of gcr.io/kubeflow/tensorflow-notebook-* #37

flx42 opened this issue Dec 18, 2017 · 9 comments

Comments

@flx42
Copy link

flx42 commented Dec 18, 2017

From the README:

We also ship standard docker images that you can use for training Tensorflow models with Jupyter.

gcr.io/kubeflow/tensorflow-notebook-cpu
gcr.io/kubeflow/tensorflow-notebook-gpu

[...] Note that GPU-based image is several gigabytes in size and may take a few minutes to localize.

("localize"?)

They are both large:

$ docker images gcr.io/kubeflow/tensorflow-notebook-gpu:latest
REPOSITORY                                TAG                 IMAGE ID            CREATED             SIZE
gcr.io/kubeflow/tensorflow-notebook-gpu   latest              e68d36c67064        2 weeks ago         7.11GB

$ docker images gcr.io/kubeflow/tensorflow-notebook-cpu:latest
REPOSITORY                                TAG                 IMAGE ID            CREATED             SIZE
gcr.io/kubeflow/tensorflow-notebook-cpu   latest              9cb2a6008740        2 weeks ago         5.17GB

Are the Dockerfiles public for these images? I can probably do a quick PR to improve the size.

You might be interested to look at the improvements I did in the devel-gpu Dockerfile for TensorFlow:
tensorflow/tensorflow#15355

Also, it would be helpful if you could chime in on this RFE:
tensorflow/tensorflow#15284
Maybe we can have a single image with Jupyter+TensorFlow+TensorBoard? That would shrink the other TensorFlow images that are shipped today (e.g. gpu and devel-gpu).

@flx42 flx42 changed the title Size Size of gcr.io/kubeflow/tensorflow-notebook-* Dec 18, 2017
@pineking
Copy link
Member

I also think the size can be reduced. BTW: is it possible to push the images to dockerhub instead of gcr.io?

@jlewi
Copy link
Contributor

jlewi commented Dec 18, 2017

@vishh Are we just using the tensorflow Docker images? I don't see any Dcokerfiles for these notebook images inside google/kubeflow.

@flx42
Copy link
Author

flx42 commented Dec 18, 2017

No it's not the same, if you do docker history --no-trunc gcr.io/kubeflow/tensorflow-notebook-gpu, you can see it's different from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu or https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.gpu

@jlewi
Copy link
Contributor

jlewi commented Dec 18, 2017

We'd like to support a bunch of different frameworks e.g.

  • TensorFlow
  • xgboost
  • scikits

Some questions:

  • Should we provide one fat image with all these libraries or have multiple images?
  • Are there existing, curated images that we can reuse as opposed to building our own?

@aronchick
Copy link
Contributor

aronchick commented Dec 18, 2017 via email

@yuvipanda
Copy link
Contributor

@aronchick afaict gcr.io doesn't provide a human friendly URL to pass to people, which I've always found annoying for public images.

@jlewi there's some at http://github.com/jupyter/docker-stacks/ (and PRs welcome!) that do get a fair amount of usage.

@flx42
Copy link
Author

flx42 commented Dec 18, 2017

I think it makes sense to have one "fat" image, it it allows us to keep the other images lean.
This image could target being a development environment for data scientists: Jupyter, TensorFlow, TensorBoard and usual python dependencies.

@jlewi
Copy link
Contributor

jlewi commented Dec 19, 2017

This is the source for our existing Docker images
https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/example/tensorflow-notebook-image

So everything is public and we should probably move them into Kubeflow.

Using an NVIDIA as the base image for our GPU images makes sense to me.

/cc @flx42

@flx42
Copy link
Author

flx42 commented Dec 20, 2017

Ok, let's discuss about the size again when it's on this repo.
But I believe it still makes sense to check with the TensorFlow team if a single common Jupyter image can be created.

k8s-ci-robot pushed a commit that referenced this issue Apr 20, 2018
Remove a lot of bloat - install only the minimal set of packages
required to get started with ML. Any packages required can be installed
by the user in the notebook itself using pip install/conda install

Image size has gone down from 12GB to 3GB for cpu image

Having a lot of packages makes it very challenging to maintain them
because of version conflicts

Run everything as jovyan user - this enables user to run conda install
/ pip install without requiring sudo

Add comments on every step

Fixes #668
Fixes #37
Fixes #472
pdmack pushed a commit to pdmack/kubeflow that referenced this issue Apr 21, 2018
Remove a lot of bloat - install only the minimal set of packages
required to get started with ML. Any packages required can be installed
by the user in the notebook itself using pip install/conda install

Image size has gone down from 12GB to 3GB for cpu image

Having a lot of packages makes it very challenging to maintain them
because of version conflicts

Run everything as jovyan user - this enables user to run conda install
/ pip install without requiring sudo

Add comments on every step

Fixes kubeflow#668
Fixes kubeflow#37
Fixes kubeflow#472
Conflicts:
	components/tensorflow-notebook-image/Dockerfile
	components/tensorflow-notebook-image/build_image.sh
	components/tensorflow-notebook-image/releaser/components/workflows.libsonnet
k8s-ci-robot pushed a commit that referenced this issue Apr 21, 2018
… gcr.io locations (#703)

* Refactor tensorflow-notebook-image/Dockerfile (#689)

Remove a lot of bloat - install only the minimal set of packages
required to get started with ML. Any packages required can be installed
by the user in the notebook itself using pip install/conda install

Image size has gone down from 12GB to 3GB for cpu image

Having a lot of packages makes it very challenging to maintain them
because of version conflicts

Run everything as jovyan user - this enables user to run conda install
/ pip install without requiring sudo

Add comments on every step

Fixes #668
Fixes #37
Fixes #472
Conflicts:
	components/tensorflow-notebook-image/Dockerfile
	components/tensorflow-notebook-image/build_image.sh
	components/tensorflow-notebook-image/releaser/components/workflows.libsonnet

* Update various images in kubeflow to kubeflow-images-public (#635)

Point them to kubeflow-images-public instead of kubeflow-images-staging

Related to #534
/cc @jlewi
Conflicts:
	bootstrap/Makefile
	bootstrap/README.md

* Migrate images to kubeflow-images-public (#695)

Related to #534
Conflicts:
	bootstrap/README.md
	docs_dev/images.md
	kubeflow/core/tests/tf-job_test.jsonnet

* Update the hub spawner dropdown for latest NB images (#697)
kimwnasptd pushed a commit to arrikto/kubeflow that referenced this issue Mar 5, 2019
* This project will be used by the folks at GoJek and Google PSO to
  develop and test feast.

Related to kubeflow/testing#254
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
Remove a lot of bloat - install only the minimal set of packages
required to get started with ML. Any packages required can be installed
by the user in the notebook itself using pip install/conda install

Image size has gone down from 12GB to 3GB for cpu image

Having a lot of packages makes it very challenging to maintain them
because of version conflicts

Run everything as jovyan user - this enables user to run conda install
/ pip install without requiring sudo

Add comments on every step

Fixes kubeflow#668
Fixes kubeflow#37
Fixes kubeflow#472
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants