-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove blocking Kernel Startup #86
Comments
I believe this issue is a result of the duration it takes to "start" a kernel relative to a remote cluster. In normal circumstances (local kernels) the duration is on the order of a second (to invoke the process and have the kernel respond). For remote kernels running as yarn-cluster, that duration is an order of magnitude longer (10-15 seconds) which exacerbates as blocking startups on the front end. I tried to look into this early on but couldn't make any headway because the Jupyter framework assumes the kernel is communicating immediately on return from the startup logic (so using a thread doesn't really help). |
I haven't been able to make any headway into this. The I tried introducing asynchronous behaviors at various levels within this framework - either at the point of launch, higher up at the start of the "single kernel manager", or even higher within the "multi-kernel manager" sequence - as well as replacing In addition, I suspect the current asynchronous intentions are not working as expected, but that has gone unnoticed because regular kernel startups are on the order of one, maybe two, seconds. Here are some links I found that seemed applicable to what we need: Placing back on the backlog for now. It would be great if someone else could spend some time with this. I'd be happy to share more about what I tried. |
As promised in another thread, some comments on this; Apologies for the delay, I've been pretty busy on other things (like deleting root's home on one of the cluster I manage because i'm stupid...) I believe there is a path forward that keep backward compatibility but will allow us to slowly migrate to async-almost-everything and not block kernel launching. I'm going to use python 3 syntax, but it should be possible to do that on Python 2 with
For client that do not run in an event-loop, and are "quasi sync", you can fake an event loop that just advance everything like in there
In python 3 at least, you can figure out whether a function is synchrone using I suggest the following for every method
For every method
Everytime we encounter a non-async function, or a non-async function get called have extensive deprecation warning that ask users to:
I've been toying with that in jupyter/jupyter_client#402, and jupyter/notebook#4126, I just need to find the right abstraction and pattern to allow the transition to be as seamless as possible for users. As far as I can tell, we should be able to do a transition where all of the new API are compatible with the future ones with minimal disruptions and code change. I believe with the right tooling, documentation and example that should be something relatively systematic to do, so the would just be a question of time and outreach to review all the possible places that need to be updated. Apologies if this is a bit unclear, I'm still trying to arrange things in my head, and I still need to dive into kernel and enterprise gateway to see how it affects or would simplify those. |
@Carreau - thank you for your comment. Yes, I've been monitoring your WIP PRs and agree that its best to see how the dust settles on those. In addition, @takluyver's WIP PR may essentially eliminate all kernel management from Enterprise Gateway anyway - so we need to wait to see where his effort takes us. I would recommend you coordinate with Thomas, although I suspect you have already done that. Looks like Thomas is accounting for async kernel providers also - all good. In the meantime, we will continue to monitor these kernel management PRs and contribute where we can. Our primary interest is also in preserving the existing functionality we provide and to minimize any migration challenges that lie ahead when adopting these new architectures. |
To be honest, not really. I tried to figure this out, and there are Part of the muddle I got into was that I was planning to build everything directly on asyncio, but then there's a nice ZMQ 'stream' interface built on tornado, and then I started reading about trio, and... |
Updated start_kernel and restart_kernel methods to coroutines. Converted various `time.sleep()` to `yield gen.sleep()` which enables tornado to "slice-out" to a different request. As a result, converting `confirm_remote_startup()` and `handle_timeout()` to coroutines was key. Fixes jupyter-server#86
@kevin-bates Would you mind if I ask I'll appreciate it if any guide is given as well for upgrading jupyter notebook and client in |
@esevan - Although I performed validation of the changes via the REST API, an update to the nb2kg image shouldn't be necessary since the actual changes to enable concurrency really take place in EG via #580. The notebook and jupyter_client changes essentially facilitate the changes in EG. If you're finding that not to be the case, please let me know along with how you go about starting kernels concurrently via the NB2KG image and I'd be happy to take a look. |
@kevin-bates I'm using this image as my notebook environment. https://github.com/jupyter/enterprise_gateway/blob/master/etc/docker/nb2kg/Dockerfile I expected this included notebook and client, that It should be changed to facilitate #580. |
Updated start_kernel and restart_kernel methods to coroutines. Converted various `time.sleep()` to `yield gen.sleep()` which enables tornado to "slice-out" to a different request. As a result, converting `confirm_remote_startup()` and `handle_timeout()` to coroutines was key. Fixes jupyter-server#86
That's correct that the image uses notebook and jupyter_client, but the actual kernel startup and management gets proxied (via NB2KG) to the Enterprise Gateway server. That's where the practical portion of the concurrency changes reside. The NB and JC changes are merely to turn some existing methods into coroutines, but there weren't any additional changes made that trigger significant concurrency. The primary change was to replace the I've attached the 3 dev-built whl files for EG, NB and JC. You must pip install EG before NB since the NB version violates EG's constraints. If you only need the applicable EG image, I've pushed one that can be pulled using: I hope that helps. |
@kevin-bates So you just modified EG process embedding JC and NB. I thought client side was also changed, and now I got it. Thanks a lot! |
@esevan - this is excellent! |
Updated start_kernel and restart_kernel methods to coroutines. Converted various `time.sleep()` to `yield gen.sleep()` which enables tornado to "slice-out" to a different request. As a result, converting `confirm_remote_startup()` and `handle_timeout()` to coroutines was key. Fixes jupyter-server#86
Updated start_kernel and restart_kernel methods to coroutines. Converted various `time.sleep()` to `yield gen.sleep()` which enables tornado to "slice-out" to a different request. As a result, converting `confirm_remote_startup()` and `handle_timeout()` to coroutines was key. Fixes jupyter-server#86
Single-threaded kernel startup causing notebook front end to lose connection to gateway as a lot of kernels are being started--so that notebook has to re-pull kernel specs from gateway after Elyra finishes starting kernels.
The text was updated successfully, but these errors were encountered: