-
Notifications
You must be signed in to change notification settings - Fork 14
Add support for custom environment with Jupyter #93
Comments
Hello @mgoeminne, Thanks for your feature request! Do you think that BinderHub could meet your needs? See the diagram of the BinderHub architecture: https://binderhub.readthedocs.io/en/latest/overview.html#a-diagram-of-the-binderhub-architecture BinderHub seems to allow a user to create automatically a Jupyter Notebook based on a git repository. BinderHub generates a Docker image based on specifications, requirements made in the git repo. This video explains very well how BinderHub works: https://www.youtube.com/watch?v=KcC0W5LP9GM Tell us if you think that it makes senses to add BinderHub to FADI. |
@alexnuttinck Thank you for your reactivity.I never used BinderHub, but it looks promizing. However, I fear having to manage the specifications/requirements on Git repository limits the user experience, since she has to manage this repo. On the other hand, managing requirements by using a Git repository is pretty interesting, from the evolution/deployment management point of view. BinderHub seems to be the perfect fit, since it allows to create environments from configuration files (Docker file, Python requirements, etc.) directly from the Jupyter Hub environment. If BinderHub was systematically available, I would probably stop to complain & ask you devops guys about adding some weird dependencies to my environments 😄 |
As I understand it, this feature is practically mandatory for using Seldon without having admin access to the Kubernetes cluster. |
I think it's possible to use the jupyter docker image jupyter/repo2docker with the current jupyterhub in fadi because repo2docker is the tool used by BinderHub to build images on demand. jupyter-repo2docker is a tool to build, run, and push Docker images from source code repositories. repo2docker fetches a repository (from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory) and builds a container image in which the code can be executed. The image build process is based on the configuration files found in the repository. The repo2docker doc comes with a how to use section, including the How to automatically create a environment.yml that works with repo2docker |
https://github.com/cetic/fadi/tree/develop/examples/binderhub doc is available on binderhub on the develop branch, it will be merged soon. Binderhub will remain nevertheless as a "beta" feature. |
Is your feature request related to a problem? Please describe.
No, it's a suggestion for improving the functional coverage of FADI.
Describe the solution you'd like
A data scientist can use Jupyter Hub for iteratively explore data sets and provide technical solutions to various problems.
In order to do so, she frequently has to change the Jupyter environment of her notebooks in order to include some specific package, to test alternative processing frameworks, etc. Typically, each project / use case can have one or many dedicated environments with daily or weekly undergoing changes.
FADI should foster such a dynamic adaptation of the data scientist's needs, by providing a way to efficiently manage extra dependencies.
For instance, a Web application could be provided for specifying, adapting or copying the environment right before instantiating Jupyter Hub. An interesting feature would be the possibility to inherit environments, and to share them among stakeholders.
Describe alternatives you've considered
The current recommanded way to do it is to adapt the Helm view file of the underlying Kubernetes cluster, and to restart the appropriate services. This is not really acceptable for a end user.
An alternative consists in specifying the additional dependencies in "conda install"-like commands at the beginning of the notebooks, but that makes these specifications notebooks-specific. It also implies the additional dependencies must be satisfied each time the notebook is loaded. Environment variables/secrets must be set in the notebooks, which raises securities issues. Etc, etc.
Additional context
Please have a look on how Domino provides this features. Basically, a Docker file can be edited by the finale user for personalizing the environment.
A nice optimization would consist in caching popular / recent / frequently used environments, in such a way running notebooks using these environments would be faster.
The text was updated successfully, but these errors were encountered: