-
Notifications
You must be signed in to change notification settings - Fork 13
Proposal for kernel provisioning and gateway investigations #3
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Proposal for Incubation | ||
|
||
## Subproject name | ||
|
||
Jupyter Kernel Gateway (`jupyter-incubator/kernel_gateway`) | ||
|
||
## Development team and Advocate | ||
|
||
* Peter Parente, IBM (`@parente`) | ||
* Dan Gisolfi, IBM (`@vinomaster`) | ||
* Justin Tyberg, IBM (`@jtyberg`) | ||
* Gino Bustelo, IBM (`@ginobustelo`) | ||
|
||
Who is the Steering Council Advocate for this Subproject? | ||
|
||
* Kyle Kelley (`@rgbkrk`) | ||
|
||
## Subproject goals, scope and functionality | ||
|
||
### Problem | ||
|
||
Applications that use Jupyter kernels as execution engines outside of the traditional notebook / console user experience have started appearing on the web (e.g., [pyxie](https://github.com/oreillymedia/pyxie-static), [pyxie kernel server](https://github.com/oreillymedia/ipython-kernel), [Thebe](https://github.com/oreillymedia/thebe), [gist exec](https://github.com/rgbkrk/gistexec), [notebooks-to-dashboards](http://blog.ibmjstart.net/2015/08/22/dynamic-dashboards-from-jupyter-notebooks/), ). Today, these projects rely on: | ||
|
||
1. A _client_ (e.g., Thebe) that includes JavaScript code from Jupyter Notebook to request and communicate with kernels | ||
|
||
2. A _spawner_ (e.g., tmpnb) that provisions gateway servers to handle client kernel requests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would not jupyterhub also fall into this category of a spawner? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not as familiar with it, but since you said it, yes, it probably does. :) |
||
|
||
3. A _gateway_ (e.g., the entirety of Jupyter Notebook) that accepts [CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete) requests for kernels, isolates kernel workspaces (e.g., via Docker), and proxies web-friendly protocols (e.g., Websocket) to the kernel protocol (0mq) | ||
|
||
Maturing the Jupyter stack so that these efforts (and future ones) can find robust, common ground will require improvements in each of the above areas. Work is already underway to define a JavaScript library for communication with kernels and kernel provisioners (i.e., [jupyter/jupyter-js-services](https://github.com/jupyter/jupyter-js-services)). Discussion has started about defining an API for provisioning notebook servers as well ([binder project/binder#8](https://github.com/binder-project/binder/issues/8)), a topic that touches on the spawner concept above. This proposal focuses on the third area, the concept of a standard kernel provisioning and communication API. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would list jupyterhub also as a way of starting notebook servers. |
||
|
||
### Goals | ||
|
||
The goal of this incubator project is to **prototype and evaluate** possible solutions that satisfy the rising, novel kernel uses above, particularly with regard to the driving use cases documented below. The design and code of this incubator may become new Jupyter components, or folded into other relevant efforts underway, or discarded if better options arise as the evaluation proceeds. At present, we do not know the **correct** design and implementation, but we believe there is value in trying the following: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would clarify that today the gateway of handled by the REST and websocket endpoints of the notebook server. To me alot of this work is building on that existing layer and separating its deployment from that of the notebook itself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I would imagine that you will build in the existing REST/websocket APIs that we have today. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. I've started prototyping a gateway reusing code from jupyter_core, jupyter_client, and jupyter/notebook (the websocket bits). @rgbkrk has mentioned wanting to prototype the websocket-to-0mq bridge portion alone, separate from the provisioner, and potential in Go so it can easily drop into any kernel container without introducing new dependencies. Both, I think, fall within the scope I envision for this repo: building some cooperating prototypes and seeing what is most viable to carry forward. |
||
|
||
1. Using jupyter_client, jupyter_core, and pieces of jupyter/notebook (e.g., MappingKernelManager, etc.) to construct a headless kernel gateway that can talk to a cluster manager (e.g., Mesos). | ||
2. Implementing a websocket to 0mq bridge that can be placed in any Docker container that already runs a kernel, to allow web-friendly access to that kernel. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't that websocket to 0mq bridge already in the existing notebook server is a fairly modular form along with all of the REST stuff? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, though It is not very well decoupled and have the notion of restarting and stuff like that. |
||
3. Adding a new jupyter_client.WebsocketKernelManager that can be plugged into Jupyter Notebook or consumed by other tools to talk to kernels frontend by a websocket to 0mq bridge. (See use case #3 below). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I disagree that the server talking upstream via websockets is the right approach. I think it should be either:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If by kernel service you mean a kernel in a container with a websocket-to-zmq bridge (w2z?), then, yes, that's one of the planned experiments. This approach separates out kernel provisioning from kernel communication after provision, which is attractive. However, it punts the problem of managing comm with an number of running kernels out of scope unless there's a third component like the configurable-http-proxy for tmpnb, one that the provisioner informs about running kernels. That or the admin of the kernel service must bring his/her own proxying scheme. All of the above is fine, but I think having an all-in-one gateway service that does the provisioning and the w2z bridging in one component might provide an easier walk-up-and-try it prototype in the short term. Granted, it is more monolithic and certainly has its own scaling problems, but I see it as a valuable for proving the concept and enabling folks to start thinking about "how could I use this?"
We've certainly talked to within-cluster remote kernels using zmq before. But, from experience, when we've started toying with clients being very remote from kernels (e.g., client on my laptop, kernel in an IaaS), kernels being offered as services by cloud providers, and applications that use kernels written by new audiences (e.g., web developers), we see Websockets having a number of advantages:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would just add that creating clients that communicate directly to kernels (in languages other than Python) through 0MQ is not a trivial exercise. The Jupyter codebase already does a good job of abstracting through WebSockets. Why not leverage that? And I agree... a WebSockets interface is going to be more friendly to app developers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mainly think the first option should be better - client talks directly to the external kernel service via websockets. What I don't think we should do is make the existing server a websocket client of other web services. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we might be talking cross each other with our definition of client and server and for which scenario.
If by client you mean, for example, a JS app using jupyter-js-services, then yes. Or did you have another specific client in mind?
Do you mean the Jupyter Notebook Python server here, specifically? If so, how would it take advantage of the remote service without talking to it via websockets? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But this requires the kernel provider to have knowledge of notebook flags, correct? (assuming the provider becomes the websocket endpoint for the outside world) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's what I currently deal with in tmpnb. I'm assuming we'll have a simple flag on the kernel provisioner or other options, not the notebook flag. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think answering this question and others is part of the exploration TBD in the incubator. I certainly don't have a solid game plan yet. I think initial reference implementations can deal with the open access case and from there we can start to work on things like security. That said, I can imagine having the provider support options for authenticating and authorizing requests for kernel provisioning (Does Pete get to request another kernel on my system and has he used up his allotment?) as well as kernel connectivity (Is this Pete connecting to his kernel via a websocket?) through common mechanisms (auth headers, API key, ...) But I can also imagine punting this responsibility to other components, like a front proxy that controls access to the APIs and specific kernel websocket routes based on login. Both seem viable at face-value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh right, for the authed version we need to do auth like we do in the notebook, though likely API key based. |
||
|
||
### Use Cases | ||
|
||
#### Use Case #1: Simple, Static Web Apps w/ Modifiable Code | ||
|
||
Alice has a scientific computing blog. In her posts, Alice often includes snippets of code demonstrating concepts in languages like R and Python. She sometimes writes these snippets inline in Markdown code blocks. Other times, she embeds GitHub gists containing her code. To date, her readers can view these snippets on her blog, clone her gists to edit them, and copy/paste other code for their own purposes. | ||
|
||
Having heard about Thebe and gist exec, Alice is interested in making her code snippets executable and editable by her readers. She adds a bit of JavaScript code on her blog to include the Thebe JS library, and turn her code blocks into edit areas with Run buttons. She also configures Thebe to contact a publicly addressable Jupyter Kernel Gateway (hosted by the good graces of the community and Rackspace ;) as the execution environment for her code. | ||
|
||
When Bob visits Alice's blog, his browser loads the markup and JS for her latest post. The JS contacts the configured kernel gateway to request a new kernel. The gateway provisions a kernel in its compute cluster and sends information about the new kernel instance back to the requesting in-browser JS client. Most importantly, this response contains information about a Websocket endpoint on the kernel gateway to which the client can establish a connection for communication with the kernel. Thereafter, the gateway acts as a Websocket-to-0mq proxy for communication between Bob's browser and the kernel until Bob leaves the page and the kernel eventually shuts down. | ||
|
||
![](https://hackpad-attachments.s3.amazonaws.com/jupyter.hackpad.com_sZx2qqNHnY1_p.454990_1440709697802_undefined) | ||
|
||
Note that this use case is not much different from the current Thebe and gist exec sample applications; it simply serves to formalize the APIs and components used for the additional use cases stated next. | ||
|
||
#### Use Case #2: Notebooks Converted to Standalone Dashboard Applications | ||
|
||
Cathy uses Jupyter Notebook in her role as a data scientist at ACME Corp. She writes notebooks to munge data, create models, evaluate models, visualize results, and generate reports for internal stakeholders. Sometimes, her work informs the creation of dashboard web applications that allow other users to explore and interact with her results. In these scenarios, Cathy is faced with the task of rewriting her notebook(s) in the form of a traditional web application. Cathy would love to avoid this costly rewrite step. | ||
|
||
One day, Cathy deploys a Jupyter Kernel Gateway to the same compute cluster where she authors her Jupyter notebooks. The next time she needs to build a web app, she creates a new notebook that includes interactive widgets, uses Jupyter extensions to position the widgets in a dashboard-like layout, and transforms the notebook into a standalone NodeJS web app. Cathy deploys this web app to ACME Corp's internal web hosting platform, and configures it with the URL and credentials for the kernel gateway on her compute cluster. | ||
|
||
Cathy sends the URL of her running dashboard to David, a colleague from the ACME marketing department. When David visits the URL, the application prompts for his intranet credentials. After login, his browser loads the markup and JS for the frontend of the dashboard web app. In contrast to the open-access blog post example in the prior use case, the JavaScript in David's browser does not contact the kernel gateway directly. It does not contain the credentials to do so, and it does not contain any code from the original notebook. Instead, to limit David's control over kernels on the compute cluster, the JS in David's browser only communicates with the dashboard app NodeJS backend. The dashboard server requests the kernel, sends code to the kernel for execution upon David's interaction with the frontend, and proxies the responses back to the frontend JS for display to David. Throughout all this interaction, the kernel gateway behaves in the same manner as in the previous example: it provisions kernels and proxies Websocket-to-0mq connections for all dashboard users. | ||
|
||
![](https://hackpad-attachments.s3.amazonaws.com/jupyter.hackpad.com_sZx2qqNHnY1_p.454990_1440679339192_undefined) | ||
|
||
#### Use Case #3: Notebook Authoring Separated from Kernel/Compute Clusters | ||
|
||
Erin is a Jupyter Notebook and Spark user. She would like to pay a third-party provider for a hosted compute plus data storage solution, and drive Spark jobs on it using her local Jupyter Notebook server as the authoring environment. When Erin decides to convert some of her notebooks to dashboards, she also wants those deployed dashboards to use her compute provider to avoid having to move her data around. | ||
|
||
In a bright and not-so-distant future, Erin chooses a Spark provider that provides a Jupyter kernel service API. Her provider allows Erin to launch and communicate with Jupyter kernels via Websockets. The kernels run in containers (kernel-stacks) that are pre-configured with Spark and typical scientific computing libraries. Erin points her local Jupyter Notebook server to her provider's kernel service API. When Erin launches a new notebook locally, her Notebook server does the work of requesting a kernel from the kernel service API, and establishing a Websocket connection to the provider's kernel gateway, which proxies her commands to her running kernel via 0mq. | ||
|
||
When Erin converts one of her notebooks to a dashboard, she supplies credentials for the dashboard server to access her kernel provider. When users visit Erin's dashboard, the dashboard server contacts the kernel provider to manage the lifecycle and communication with kernels. | ||
|
||
When Frank, a colleague of Erin, learns about her great setup, he asks to share her compute provider account and make it a team account. Happy to help, Erin does so. Frank then spins up a VM in his current cloud provider, runs Jupyter Notebook server on it, and points it to the kernel gateway running in Erin's hosted environment in the same manner Erin did with her local Jupyter instance. | ||
|
||
![](https://hackpad-attachments.s3.amazonaws.com/jupyter.hackpad.com_sZx2qqNHnY1_p.454990_1440679897295_undefined) | ||
|
||
**N.B.:** The key difference in this scenario versus what exists in Jupyter Notebook today lies in the fact that the Jupyter Notebook server is no longer talking to kernels via 0mq. Rather, the Notebook **server** is a Websocket client itself, much like the Notebook frontend, and communicates with kernels via a kernel gateway. This setup makes it possible to run the Notebook web application outside of the compute cluster, across the web if need be. Of course, it would require new remote kernel provisioning and Websocket client code paths within the Jupyter Notebook Python code to realize. | ||
|
||
## Audience | ||
|
||
* Jupyter Notebook users who want to run their notebook server remote from their kernel compute cluster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a really important usage case, awesome! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
* Cloud providers who want to provide remote access to kernel compute services to clients other than Jupyter Notebook (e.g., dashboards) | ||
* Application developers that want to create new tools and systems that leverage the use of kernels. | ||
|
||
## Other options | ||
|
||
Other than the proof of concepts mentioned at the start of this proposal (e.g., tmpnb used headlessly from Thebe / gistexec), there are no other clear options for enabling the use cases described above. Other up-and-coming projects (i.e., mybinder.org) may begin to improve upon these existing proof of concepts, but it is not, at the moment, designed specifically to address the use cases outlined in this proposal. | ||
|
||
## Integration with Project Jupyter | ||
|
||
This incubation effort should help Jupyter developers make informed decisions about future refactoring, reimplementation, and extension efforts with respect to kernel provisioning and access (e.g., jupyter-js-services). As mentioned above, if code assets produced from this exploratory incubation effort have merit, they should be promoted to full Jupyter projects and maintained as such (e.g., jupyter/kernel-gateway). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am +100 on these ideas, I don't think they are only userful for non-notebook clients. Have you thought about the aspects of this that integrate with the actual notebook and jupyterhub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nice thing about the notebook and jupyterhub is that they can connect directly to remote kernels in the same way, there's just not code to reach out to providers/gateways. I personally think we can build out the gateway before pushing any of this vision on the main notebook though, as it gives us a smaller scope to iterate on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍