Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jupyter/pytorch-notebook #1936

Merged
merged 43 commits into from
Nov 22, 2023
Merged

Add jupyter/pytorch-notebook #1936

merged 43 commits into from
Nov 22, 2023

Conversation

twalcari
Copy link
Contributor

@twalcari twalcari commented Jul 6, 2023

Describe your changes

As PyTorch has grown more popular than TensorFlow over the last few years, it seems odd that we have no official PyTorch image.

cfr. Google Trends of tensorflow (blue line) vs. pytorch (red line):

image

I've been supplying a PyTorch image to my researchers at Ghent University, which has proven to been very popular. I'm now proposing to upstream this image (be it the CPU-only version instead of the GPU-enabled version at Ghent University).

Note that the installation is done with pip instead of conda: when using the latter conflicts arise with the nomkl package that is installed.

I've based myself on #1926 for this PR so that I hopefully have not missed any location where I needed to add changes.

I have not been able to test this PR against the aarch64 architecture.

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

.github/actions/download-manifests/action.yml Outdated Show resolved Hide resolved
.github/workflows/docker.yml Outdated Show resolved Hide resolved
.github/workflows/hub-overview.yml Outdated Show resolved Hide resolved
pytorch-notebook/Dockerfile Outdated Show resolved Hide resolved
@mathbunnyru
Copy link
Member

We should consider alternatives here, for example, to create deeplearning-notebook, which will include both tensorflow and pytorch (we can replace the existing tensorflow-notebook or inherit from it).
I'm not convinced we should add one more image stack here and I think we should think about this.

@twalcari
Copy link
Contributor Author

twalcari commented Jul 6, 2023

It's possible to install pytorch in/on top of the existing tensorflow image, but this makes this image extra large.

I don't think many users will use PyTorch and Tensorflow at the same time hence why I created an extra image instead.

@twalcari
Copy link
Contributor Author

twalcari commented Jul 7, 2023

The CI builds show the following image sizes:

scipy-notebook: 4.06GB
tensorflow-notebook: 5.68GB (+1.62GB)
pytorch-notebook: 4.81GB (+0.75GB)

So for an unified tensorflow+pytorch notebook we can expect the image to be around 6.43GB. I leave it up to the community to decide what the better option is.

@mathbunnyru
Copy link
Member

@consideRatio @minrk how do you feel about this?

@Bidek56
Copy link
Contributor

Bidek56 commented Jul 9, 2023

  1. There are plenty of Pytorch images in the hub already. One of them is
  2. It's easy enough to either install Torch in an existing image or extend a Docker file to a have your own
  3. @twalcari did a great job creating the PR 🥇

@consideRatio
Copy link
Collaborator

@consideRatio @minrk how do you feel about this?

I think its critical that this project has a policy on whats to be considered in scope for the project.

This opinion has evolved based on another project. In jupyterhub/oauthenticator we said all kinds of new identity provider specific jupyterhub authenticator classes was accepted, so the project grew and grew but ended up unsustainable to maintain. At some point we decided to stop accepting new authenticators and removed some that could migrate to another good option easily.

I suggest a policy discussion is started in a dedicated issue.

I'm personally conservative, looking to ensure that what we have is very clearly sustainable to maintain and only opting to maintain anything considered providing sufficient value for users.

I'm on vacation since monday this week, I'm back in two weeks!

@minrk
Copy link
Member

minrk commented Jul 17, 2023

I'm also on vacation for the rest of the month, but don't have a good sense of how to draw lines, and that's made it extremely hard to maintain this repo. This is a great PR, and the case for transitioning from tensorflow to torch is sensible, but I'll defer to @mathbunnyru who's done most of the heavy maintenance lifting here. I do think that any added image should need to at least come with a commitment to maintain it, not adding work to already overloaded maintenance. I do expect as soon as we have a tensorflow-cpu image, folks will want a tensorflow-gpu image, etc.

@mathbunnyru
Copy link
Member

@consideRatio @minrk have a nice vacation ❤️

Thanks for your input, I will try to think how we can proceed here.

I do expect as soon as we have a tensorflow-cpu image, folks will want a tensorflow-gpu image, etc.

I think you mean pytorch here, and I do agree with you (even though we still don't have tensorflow-gpu image, and have cpu version for almost 7 years).

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 19, 2023

I'm sorry for creating too many commits - I've just move all images to images/ directory, so the files structure is cleaner.
I was able to update this PR, but I had to move some files one-by-one.

Even if we don't merge this PR (we still have no solution for this problem), we will have an example of a good PR how one can introduce a new image.

@mathbunnyru
Copy link
Member

mathbunnyru commented Oct 27, 2023

We now have a policy for merging new images and adding new packages 🎉

@mathbunnyru, @consideRatio, @yuvipanda, and @manics, please vote 👍 to accept this change and 👎 not to accept it (use a reaction to this message)
The voting deadline is the 27th of November (a month since I posted this message).
The change is accepted, if there are at least 2 positive votes.

We can have a discussion until the deadline, so please express your opinions.

@manics
Copy link
Contributor

manics commented Oct 27, 2023

I haven't reviewed the implementation in the PR but given that there's a Tensorflow image I think it's reasonable to add a Pytorch one if someone's willing to maintain it.

However my preferred option is as @mathbunnyru suggested to have a multi-framework image, ideally replacing the tensorflow image.. The datascience image has both R and Python which are unlikely to be used simultaneously so having multiple ML frameworks in an single image seems reasonable. It'll significantly increase the image size, but the datascience packages and ML frameworks are already quite large, and I think the benefits of not adding yet another image are worth the increase in size.

I'm 👍 on a multi-framework image, and neutral on a pytorch specific image.

@consideRatio
Copy link
Collaborator

For me, deciding on this is a matter of prioritization of available maintenance capacity. I'm a bit concerned by having any image dedicated to any library due to the maintenance load it could lead to long term.

At the same time, the libraries related to GPU like tensorflow, keras, pytorch etc has from what i recall been a hassle to install - and to install all at the same time may be too hard if we would aim for an all-in-one image, making maintenance harder and not easier. Also, of course a larger image size.

This is not an easy call, i dont know =\ I think maybe images should be in an alpha/beta stage of maintenance before they are comitted to long term? If they turn out too complicated to maintain, they could be dropped etc?

@mathbunnyru
Copy link
Member

Well, I think combining our ideas, I suggest we do the following:

  1. Let's have a jupyter/deep-learning-notebook image
  2. It will be inherited from jupyter/tensorflow-notebook and add pytorch on top
  3. We will be quite open to adding top-rated deep-learning packages there (using our voting process). Note: I'm not talking about GPU image here, because I think that's a whole other issue, so we will be keeping it CPU.
  4. This image will be marked as beta in the README on registry page, which means it can be easily removed/changed/have some issues.
  5. If everything works out, we will drop tensorflow-notebook in favor of deep-learning-notebook. It can be done in quite an interesting way - for a few months we will be pushing deep-learning-notebook instead of tensorflow and we'll see what people think (and if they even notice). We will also be showing a message in the terminal: "this image is no longer supported, please use....".

Does it sound ok?

@mathbunnyru
Copy link
Member

@consideRatio does the previous suggestion sound ok to you?

@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 5, 2023

@twalcari may I ask you to update this PR, which will add jupyter/deep-learning-notebook inherited from tensorflow image and adding pytorch on top (instead of pytorch-notebook image)?
I think it might be even better to create a new PR.

@twalcari twalcari mentioned this pull request Nov 6, 2023
4 tasks
@mathbunnyru mathbunnyru merged commit 278dd76 into jupyter:main Nov 22, 2023
75 of 76 checks passed
@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 22, 2023

@twalcari thank you for your contribution!

I'm sorry it took so long, but we established a voting group and some processes during this PR, so it should be significantly better next time we're asked to do something similar.

Also, I updated lots of workflows/actions during this PR (I am done for now). I hope my merges of main branch to your branches were not to annoying and I tried to fix all the problems I created, so you don't have to spend you time too much.
And this PR is a great example of how one can add a new image (and it's mentioned in the docs).

@twalcari
Copy link
Contributor Author

🎉!

I completely understand that this took some time to figure out. I'm glad that this helped formalize the process for accepting new notebooks. I think new notebooks should only be allowed here very sparingly but PyTorch was one of the exceptions 🙂 In my opinion, only 'ecosystems' as well established as R / Julia / PyTorch / TensorFlow should be considered.

Also many thanks to you for keeping this PR up-to-date throughout the process!

@twalcari twalcari deleted the pytorch-image branch November 22, 2023 11:57
@mathbunnyru mathbunnyru mentioned this pull request Apr 28, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants