Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate use-cases / infra around JupyterHubs+other applications on Kubernetes #382

Closed
choldgraf opened this issue Mar 10, 2021 · 28 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Mar 10, 2021

Background

In several different spaces now we have seen people build technology to facilitate deploying JupyterHubs on Kubernetes alongside other applications. A few examples:

(I would also be curious to hear what @arnim or @bitnik or @manics or @sgibson91 have done with their deployments, if they've had similar challenges).

Similarities and differences

I think there are a few pieces that we tend to see across all these deployments: JupyterHub, BinderHub, Dask Gateway, Prometheus, Grafana. Then there are common needs across all of them: environment and user management, connections between these applications, etc.

The main differences are in the implementation - some use Terraform, some use Helm, some use a combination of the two. Some are more generic and user-facing, while others are bespoke for a specific group or project.

So it seems like many projects (including the JupyterHub project itself) have a common need around deploying JupyterHub deployments in a flexible manner, with minimal human intervention, alongside other applications.

Question

Is there an opportunity to coordinate and streamline some development and share infrastructure for this use-case across these projects. I think there would be value in a community-driven project that addresses this use-case, and that was flexible that some or all of these other projects could utilize this, rather than re-creating similar technology.

  • Do people agree about this shared need?
  • Do people think this problem is technically tractable to fix with shared infrastructure?
  • Can we define a set of use-cases that are common enough that it's worth building shared infrastructure around it?
  • Does this need also imply improvements that could be made lower in the stack? (e.g., at the level of the JupyterHub python app)
  • Is this in-scope for the JupyterHub community in general? Is there a better community space to have this conversation?

Would love to hear what people think. If we could identify a strong enough need, I bet we could also turn this into a funding opportunity to support work along these lines.

@jacobtomlinson
Copy link

jacobtomlinson commented Mar 11, 2021

Thanks for raising this @choldgraf, I think this is a useful conversation to have.

From Dask's perspective, we maintain three Helm charts:

  • dask a single Jupyter Lab instance and a small manually scalable Dask cluster. Target is a single user with access to Kubernetes and wants to try Dask out.
  • dask-gateway a multi-user cluster creation tool. Target is a team/org that wants to centrally manage the creation and use of Dask clusters.
  • daskhub a very thin layer that depends on upstream dask-gateway and jupyterhub charts and overrides some default config to get them talking to each other. Target is a team/org that wants to centrally manage the creation of Jupyter instances and Dask clusters.

Looking through the other projects here that use Dask it seems some of them depend on these charts:

  • 2i2c uses daskhub but seems to have vendored/forked the repo.
  • Pangeo cloud federation uses daskhub directly.
  • qhub uses dask-gateway and jupyterhub charts directly, replicating daskhub but taking it much further.

All of these projects go much further than daskhub in providing a more complete interactive science environment. This is great and I think beyond the scope of what we should be providing in Dask. The fact that they depend on our upstream charts is a good sign that we have built some building blocks that can be leveraged by others. It suggests the tools we maintain are lower in the stack than the other projects mentioned here.

Given that many folks also seem to be deploying prometheus/grafana I think it is our place to ensure that our integration with prom is the best that it can be, and that we expose as many useful metrics as we can. But it would be out of scope for us to provide prometheus itself in our charts.

I have occasionally considered creating some official Terraform or vendor specific IaC config like CloudFormations for Dask. These would probably be more similar to the dask chart but without the Kubernetes dependency, but could also go as far as something like daskhub. I expect these would be more useful as reference implementations than things folks would take off the shelf and use. In my experience once orgs get to the level of deploying Dask on k8s with terraform they probably have enough internal process and standardization that they would roll their own instead of using off-the-shelf configs. But this list suggests that perhaps some Terraform docs within Dask would be beneficial.

So to answer some of the direct questions:

  • Yes I think there is a shared need.
  • I think aligning projects is useful, but once orgs reach a certain scale I think they will roll their own solution.
  • If there is anything we can do lower in the stack in Dask to help I'm all for it!

@minrk
Copy link
Member

minrk commented Mar 11, 2021

I absolutely think this makes sense. One of the first steps might be describing what everyone is doing to see where there's room for consolidation.

I know @consideRatio has been doing work in z2jh to help with using jupyterhub as a dependency, but that's also a likely area of work for us: what are the challenges when integrating jupyterhub as a dependency?

z2jh now a very complicated chart with loads of moving parts. I'm not sure I would recommend anyone deploy jupyterhub on kubernetes without it at this point, or at least know that you are very much on your own if you do. Is anyone doing this now?

One of the things I think we can do is publish some basic grafana charts for monitoring jupyterhub. That's certainly a place where there's lots of probably unnecessary copying and pasting that could be importing.

I'm not sure what the role of something that bundles jupyterhub, prometheus, etc. would be - a 'distribution' chart that's more complete? I'm not so sure. I think documentation / examples might be the right level, plus publishing one or more example grafana dashboards.

Maybe it's time for another online JupyterHub workshop - we had one of these a couple years ago bringing in especially supercomputer folks to discuss things like BatchSpawner. Perhaps this could be a topic for one later this spring?

@jacobtomlinson
Copy link

Publishing Grafana dashboards is a nice idea. We had one shared in dask/distributed#3136 but perhaps we should go further and put some in the Dask documentation.

@manics
Copy link
Member

manics commented Mar 11, 2021

I think part of this discussion should be about who Z2JH is aimed at. There was an informative discussion on
jupyterhub/zero-to-jupyterhub-k8s#1934
regarding the balance between making things easier for novice K8s admins vs inconveniencing more experienced admins.

That particular issue was resolved, but to me it's still not clear whether there's overall agreement on the current aims of Z2JH.

@yuvipanda
Copy link
Collaborator

One of the things I think we can do is publish some basic grafana charts for monitoring jupyterhub. That's certainly a place where there's lots of probably unnecessary copying and pasting that could be importing.

@minrk check out https://github.com/yuvipanda/jupyterhub-grafana! I've some more changes to it that I should push.

@yuvipanda
Copy link
Collaborator

I usually deploy many JupyterHubs per cluster, and one 'support' chart per cluster. The support chart (like this) usually installs:

  1. Prometheus
  2. Grafana
  3. (Optionally) nginx-ingress
  4. (Optionally) cert-manager with appropriate cluster-issuer as well
  5. (In-the-future) an NFS server

Maybe we can publish this?

@arnim
Copy link

arnim commented Mar 11, 2021

Maybe @MridulS can say something with respect to the GESIS perspective - some coordination could indeed be useful

@rcthomas
Copy link
Contributor

Maybe it's time for another online JupyterHub workshop - we had one of these a couple years ago bringing in especially supercomputer folks to discuss things like BatchSpawner. Perhaps this could be a topic for one later this spring?

I know supercomputing was mentioned more in passing here but there are things going on with Kubernetes there including a few JupyterHub deployments. I think that community would be interested in participating in the conversation.

@choldgraf
Copy link
Member Author

A few thoughts in response to others above:

I think documentation / examples might be the right level, plus publishing one or more example grafana dashboards

Totally agree - I think our approach of "document the pattern well first, and only when absolutely necessary create a tool to automate it" has been quite useful at keeping the tech modular.

I'd also be curious to hear from @costrouc on the experience in wrapping all of this complexity into a single package w/ feature flags and such. Perhaps that is a good data point to see what this looks like on the "solve it with tech" vs. "solve it with docs" question.

Maybe a good first step would be to document how to deploy a Kubernetes cluster w/:

  1. Z2JH chart
  2. Dask Gateway (either via DaskHub or via a custom meta-chart?)
  3. With Prometheus feeding into Grafana
  4. With some kinds of best-practices services like what @yuvipanda shared in Coordinate use-cases / infra around JupyterHubs+other applications on Kubernetes #382 (comment)
  5. With access to a dataset via Zarr

Now that I've typed that, I guess that's basically just "the Pangeo Model". What if we made it a little documentation site, we could call it The Pangeo Way 😅.

I think the challenge there is extending that model to new applications. If people wanted to add on new charts etc, how could they do so gracefully? Is there a way to capture that complexity with documentation?

2i2c uses daskhub but seems to have vendored/forked the repo

Just for now, I think the plan is to revert to upstream, but this is a short-term hack :-)

https://github.com/2i2c-org/pilot-hubs/blob/master/hub-templates/daskhub/values.yaml#L2

Maybe it's time for another online JupyterHub workshop

That is a great idea @minrk ! And @rcthomas we should definitely bring the HPC world in as well.

@jacobtomlinson
Copy link

What if we made it a little documentation site, we could call it The Pangeo Way 😅.

This idea sounds great.

I worry a little about namechecking Pangeo in the title though. The Pangeo community is awesome and has done a lot to bring geoscientists together around these tools. However this stack is useful way beyond geosciences and in my experience, folks outside of it have dismissed it because they do not identify with that community.

@sgibson91
Copy link
Member

☝🏻 We occasionally see a similar problem with "The Turing Way" ("My boss won't let me contribute because it's under the Turing Institute's umbrella"). Would it be too snubbing of Pangeo to call it "The Jupyter Way", or should we come up with something completely different?

@jacobtomlinson
Copy link

Maybe "The Jupyter Hub Way" would work? And be more specific.

I don't think it would be snubbing Pangeo, especially if there was a history page containing references to Pangeo.

@rabernat
Copy link

rabernat commented Mar 15, 2021

The problem of the name / organizational ownership precluding contribution is a serious one that we need to think about. All the organizations named above have invested effort in developing some sort of product that is branded in some way (Pangeo, Qhub, DaskHub, 2i2c hubs, etc.) Although we all believe in collaboration, we all have strategic incentives to maintain our brand. This is an issue for Pangeo, but, from my perspective, we would happily trade that brand recognition for a more functional / maintainable code base. The tradeoff for for-profit companies may be different.

So one key question is, what is the name of the thing we would all feel comfortable rallying around? IMO, "The Jupyter Hub Way" doesn't quite have the zing we need to inspire people. I'm particularly interested to hear from the Quansight folks (@costrouc / @dharhas) what sort of name / structure / organization might entice you to upstream some of the things you've built. Conversely, maybe the most convenient name for the thing is in fact, "qhub," since they have put significant work into documenting / packaging it?

Related to this question of naming is the question of governance.

@choldgraf
Copy link
Member Author

I sat in on the JupyterLab RTC meeting today, and they noted that one major upcoming challenge is authorization. @echarles also provided some helpful context here.

It sounds like this might be another helpful use-case to coordinate, related to the "multi-application deployments" question, because if you're deploying a Dask Gateway cluster, then you're also probably interested in making sure it's only used by the people you want to use it.

@jacobtomlinson
Copy link

I think it's very common for Dask Gateway to defer authentication to Jupyter Hub.

@minrk
Copy link
Member

minrk commented Mar 23, 2021

I think there's lots of good, nitty gritty work on authorization integrations, especially building on the RBAC work, to add authorization controls and integrations for things - i.e. user X has collaboration permissions on user Y's server, and user Y can launch a cluster with up to N cpus, etc. Authentication is the coarsest "who are you" layer of that, but I think we can help on some pressing issues here for nicer ways to manage permissions. It's all mostly technically possible right now, but we can definitely smooth it out a lot with some effort.

@choldgraf
Copy link
Member Author

choldgraf commented Mar 24, 2021

Thanks everybody for the feedback about branding and such, that is a good point @sgibson91 / @rabernat - I think it's important that we not come across as favoring a specific project in community tooling / standards. Also agreed that, while Jupyter is a good multi-stakeholder community for this, "The JupyterHub Way" is some combination of not punchy enough and also not specific enough.

Here's a proposal that initially focuses around documentation improvements and additions. It is inspired a bit by the divio documentation framework and I think could help us structure the docs in a way that is more easily extensible to these new use-cases.

  1. Re-brand Zero to JupyterHub for Kubernetes as just JupyterHub for Kubernetes (maybe j4k.jupyter.org?). Turn Zero to JupyterHub into a tutorial.
  2. Create a Topic Guides section of the j4k guide, alongside tutorials/
  3. Add another tutorial called Deploy JupyterHub with a Dask Gateway cluster. This contains step by step instructions for deploying your JupyterHub alongside a Dask Gateway. This is a "getting started" guide for this type of deployment. It introduces the concept of the Dask Gateway helm charts etc.
  4. Add a Topic Guides section for "Deploying JupyterHub with other cloud applications". Include "how to" answers to specific questions like "How can I authorize access to Dask Gateway using JupyterHub?". These sections focus on the integration of JupyterHub with other tools, not configuration of those tools themselves (but it should link out to the documentation of those tools heavily).
  5. Create an Explanations section that focuses on more high-level conversations about deploying applications on Kubernetes, best-practices, etc. For example, the section on estimating cloud costs could be an in-depth explanation, or a section describing the value of automating parts of deployment with CI/CD to improve team practices.

In the process of doing this, two things might become clear:

  1. Perhaps there is just too much complexity there to all shove in a single guide. If that's the case, then we consider creating a new guide that focuses on deploying complex, multi-application data science environments in the cloud using JupyterHub. We could call it something like Jupyter in the Cloud - a guide to deploying Jupyter-based data science environments on Kubernetes.
  2. Perhaps there are ways to automate parts of this documentation through tooling. As @jacobtomlinson mentions, the challenge is finding the right balance between constraints and flexibility. I think QHub has already done a nice job of turning some decision points into a pre-set tool, and it'd be great to hear if they think there's the potential to upstream or collaborate a bit on shared tooling.
  3. There are opportunities for focused development to flesh out this story. For example, if I want to write a guide called "How to authenticate your Dask Gateway using JupyterHub", I first need to make it technically possible 😅

This feels like a substantial amount of work, and potentially a good topic for a CZI EOSS application. We've discussed this a bit in #380 - but I think that creating a high-quality pattern like this would be of immense value across the community. Do others agree?

@rkdarst
Copy link

rkdarst commented Mar 24, 2021

"JupyterHub conceptual intro" jupyterhub/jupyterhub#2726 seems very relevant here, and practically a pre-requisite for anyone who wants to do advanced configuration. I know I should get around to it again, but I haven't found time. Anyone else who would like to help is more than welcome!

@manics
Copy link
Member

manics commented Mar 24, 2021

@choldgraf I like your suggestions of logically splitting up the docs a bit more. A potential added benefit is that others may feel more empowered to write their own guides since there's no longer one central guide.

I think it's also worth looking at ways to make it easier to test the docs. If someone opens a docs PR against the Z2JH guide you either need to setup the full environment, then copy and paste each step to validate the change, or (more likely) just assume it works. Could we e.g. do something with a jupyter bash notebooks so you can extract the commands and run them in CI, or do something with executable books. It'd be a pretty neat demonstration of the Jupyter ecosystem 😃.

I started playing with Katacoda this week https://katacoda.com/manics/scenarios/jupyterhub-kubernetes
AFAIK it's a closed platform so probably not suitable for a main guide, but it could be useful as an additional extra? Or perhaps we can use it as inspiration for something similar running in mybinder?

@rkdarst
Copy link

rkdarst commented Mar 24, 2021

I'm a bit late here overall (both to this conversation, also my overall efforts have changed and I spent less time on Jupyter anyway - this would have been much more interesting to me a few years ago, which is too bad). This also seems mostly focused on kubernetes, which I don't have much to add to.

But... out of curiosity, does anyone run jupyterhub in kubernetes where UIDs (os.getuid()) have some significance? I feel like I am completely alone here (probably for good reason, it's not "cloud-scale"). But since I already have a free network filesystem that scales to tens of thousands of users at my university, I build on that and get instant filesystem-level collaboration and sharing. Yes, I realize the downsides, but I find it cool in its own way. Would you be interested in this story as a guide?

@willingc
Copy link
Contributor

Following up on @rcthomas' comment FYI @zonca @danielballan

@zonca
Copy link

zonca commented Mar 24, 2021

yes! this would be very useful for me as well.
I write my own tutorials to deploy services on top of JupyterHub for XSEDE Jetstream, often starting from Pangeo and customizing, it would be useful to have more widely supported documentation/recipes.

For example:

@choldgraf
Copy link
Member Author

choldgraf commented Mar 24, 2021

@zonca - I'm curious in those kinds of documentation, how much you end up needing to write specific to JetStream, vs. how much of it can "assume you have K8S and a basic JupyterHub and the rest is the same across cloud platforms".

also those tutorials are 👌👌👌 - it would be great if you could contribute some of that content to a future tutorials/ section if we go in that direction!

@zonca
Copy link

zonca commented Mar 24, 2021

it depends, most of my effort is in installing Kubernetes itself, and that is all customized to Jetstream, but that is outside of the scope presented here.
The other tutorials for "addons" to JupyterHub are mostly agnostic, I mostly need them to track down what I am doing and being able to reproduce deployments.

@rcthomas
Copy link
Contributor

With respect to customizations at the JupyterHub level (inspired by the comment about "specific to JetStream"), we maintain a few and have a few custom services that might be interesting, but I've never really felt like there was much demand or a just right place for me to document them. It would be great to have a proper place for those kind of things.

@jacobtomlinson
Copy link

jacobtomlinson commented Mar 25, 2021

"How to authenticate your Dask Gateway using JupyterHub", I first need to make it technically possible

@choldgraf this is possible today. https://gateway.dask.org/authentication.html#using-jupyterhub-s-authentication

mout of curiosity, does anyone run jupyterhub in kubernetes where UIDs (os.getuid()) have some significance?

@rkdarst Typically this does not come up. User directories are generally k8s volumes stored in block or NFS storage and mounted to each Jupyter session. So everyone is the jovyan user and just gets their directory mounted. Often object storage is used as shared storage.

I think in situations like you mention, or on HPCs/Supercomputers, the user level namespacing in Docker/containerd isn't particularly helpful as there is already mature user management. I guess this is why tools like Singularity are useful, they've cherry picked the container features that are useful like images, but ignored the features that are already mature on those systems like user and network namespacing.

@dharhas
Copy link

dharhas commented Apr 12, 2021

out of curiosity, does anyone run jupyterhub in kubernetes where UIDs (os.getuid()) have some significance?

QHub uses full linux permissioning so we can have shared folders for different linux groups etc. We use libnss-wrapper to make this work.

@minrk
Copy link
Member

minrk commented Sep 1, 2021

This proposal wasn't funded, but others were, which is great! On to the next call...

@minrk minrk closed this as completed Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests