Run cells in different threads #1155

Phyks · 2016-03-01T11:08:45Z

Hi,

Sometimes I run long code in some cell, and still want to be able to run small snippets (independant) in another cell. But I cannot because the cells are ran sequentially on the same kernel.

Maybe the execution of each cell could be threaded, and then this would be possible. I know there could be issues with the GIL, but in most cases it could work I think. Typical use case would be to perform multiple (independant) long computations in parallel, in different cells, without having to deal with subprocess and so on, which would be super user-friendly.

The ideal use case, but which would be a lot more difficult to implement, is to be able to perform a long-running computation in a cell, and start to study the results. For instance, if a cell fills a list with data, it could be useful to start plotting and reading the list elements in another cell, while the computation is still running.

I did not find many references to this, except a SO thread.

Thanks

The text was updated successfully, but these errors were encountered:

takluyver · 2016-03-01T11:44:37Z

Someone could write a kernel, or an IPython extension, to do things like that. However, threads and shared memory bring up a whole host of issues (not just the GIL), and I don't think we have any plans to do anything like that in the project.

Phyks · 2016-03-01T13:01:32Z

Ok, I understand this may not fit in development plans for Jupyter. I would still think it could be useful, maybe as part of a separate kernel or iPython extension.

Actually, my typical use case for Jupyter + iPython kernel is to run scientific computation. I am then looking forward having some features made easily accessible:

Distributed computation, to dispatch evaluation of notebooks or cells on different kernels / machines, without having to think about it.
A backup of the state of the notebook, to be able to recover from a kernel failure (when running out of memory for instance) and avoid having to backup data on my own.
As part of it, some abstraction from the low-level interfaces and stuff like threads, to write code more easily and quickly.

I am not sure if this is the typical use case for Jupyter, but some of the features are already implemented, with various stability, either in Jupyter or in extensions. I am wondering how much people would be interested in such features, especially as the one described in this issue.

Concerning this issue particularly, I might have a look at kernels or writing iPython extension, if it can be of interest. I am particularly aware that threads and sharing memory is an open door for many additionnal issues, but some basic implementation may be doable, in my opinion. Especially if we either restrict this feature to very specific cases, in which we are sure there are no side effects, or let the user explicitly turn on the feature (in which case, he is responsible for it). Moreover, maybe it would be easier to do with some kernels rather than others (thinking in Julia for instance).

takluyver · 2016-03-01T14:16:40Z

There's an IPython project ipyparallel to control multiple engines, but distributing computation without the user having to think about it is a hard problem. If you're interested in that area, have a look at dask.

There's a module called dill which can save your variables and things - it's an extension of Python's standard pickle module. It still can't handle everything, but it can do quite a lot. Another approach you can look into is checkpoint-restart, which saves an entire process to a file. Here's a presentation from a couple of years ago about doing this in Python: http://conference.scipy.org/proceedings/scipy2013/pdfs/arya.pdf

Carreau · 2016-03-01T16:33:06Z

Have a look also at https://github.com/dask/distributed you can get some nice introduction on matt's blog.

You might also want to look at https://github.com/cloudpipe/cloudpickle beyond dill which can serialize more some object dill cannot.

The parallel computation is definitively not a Jupyter feature but a Python feature, and the way Python works it will be relatively hard to make it work magically.

The advantage of using things like Dask/Distributed/.. is also that it will work on non-jupyter environment, which is nice.

If you want to dive into the IPython kernel, we'll be happy to guide you and get feedback from API/Docs...

Phyks · 2016-03-01T16:51:25Z

Thanks for all the links and pointers to doc and modules! I already knew about dill which we discussed in another issue (or on the mailing-list, I am not sure at the moment). Will have a look at cloudpickle as well.

The ability to run in a non-jupyter environment is indeed really nice. My idea, and the reason I posted on Jupyter/notebook is that I think it would be really awesome to have something well integrated and packaged. One of the major feature of Jupyter notebook and iPython kernel is that it "just works", and gives a really user-friendly setup for advanced tasks, out of the box :)

I think I will try to see what I can get from assembling all of this, and if it could be worth integrating further in Jupyter, via extensions or custom kernels.

takluyver · 2016-03-01T17:00:06Z

While we want Jupyter & IPython to be usable and useful straight out of the box, they're never going to do everything you could want. There's a big ecosystem of different tools out there, and we don't want to try to subsume that all into Jupyter.

Phyks · 2016-03-01T17:01:40Z

Sure, but hopefully it could be made as easy as the current matplotlib integration: pip install matplotlib and %matplotlib notebook :)

EDIT: Maybe this discussion should move to the mailing-list or similar, as it is now stated that this is a "not notebook" issue?

takluyver · 2016-03-01T17:19:19Z

Technically that kind of integration is easy enough to do - it's working out what interface makes sense that's hard.

Venue: up to you, there's no particular problem with discussing it here. I set that milestone just because I don't think there's a specific notebook related issue to be fixed.

JamiesHQ · 2017-04-27T00:47:48Z

@Phyks : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Were you able to find a solution to your issue? Please let us know so we can close this one. thanks!

Phyks · 2017-05-02T08:27:32Z

Hi @JamiesHQ,

Sorry I have been busy lately and did not advance much on this issue. I will post any working solution I have for sure, if I get some.

micahscopes · 2017-09-17T22:29:49Z

FYI, I've been able to do some basic multithreading in Jupyter notebooks by subclassing multiprocessing.Process with ipywidgets for feedback. It works pretty well! In the future, I might use button widgets for spawning and stopping processes. I'm actually using this to run a flask server that serves a REST api for some data that's processed in parallel. I wanted to be able to use my Jupyter notebook to serve analyses in a way that could be used outside of Python. The flask process and the data analyzer are each running in their own Process subclasses and are sharing data via Manager objects. Using this system, I can start and stop new analyzers for the flask process to serve, all from the same notebook. It's pretty nice!

psychemedia · 2017-11-15T09:01:40Z

@micahscopes Have you posted an example of your basic multithreading notebook recipe anywhere?

micahscopes · 2017-11-16T05:00:26Z

Here is an example with multiprocessing.Process . I will try and make an example with multiple threads when I get a chance, but this gives you an idea! https://gist.github.com/micahscopes/2f523a8f485d3fe53cc32cef450ca27f

…

On Nov 15, 2017 3:01 AM, "Tony Hirst" ***@***.***> wrote: @micahscopes <https://github.com/micahscopes> Have you posted an example of your basic multithreading notebook recipe anywhere? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1155 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAXylrB1IwZH6-XPyl3kDI1PbPFHkYxTks5s2qh4gaJpZM4HmW9R> .

micahscopes · 2017-11-19T02:40:16Z

@psychemedia
Over the last few days, I extended that gist into a python package!

Try it out: https://github.com/micahscopes/nbmultitask/blob/master/examples.ipynb

dmvieira · 2017-12-19T11:51:30Z

Thx @micahscopes !

I'll try It!

dmvieira · 2017-12-21T23:18:08Z

It's very good! I'm doing a wrapper for spark on it: https://github.com/databootcampbr/nbthread-spark

shijianjian · 2020-03-11T02:29:49Z

Is there any more progress on this? I am interested as well.

takluyver added this to the not notebook milestone Mar 1, 2016

tonyfast closed this as completed Feb 10, 2022

JasonWeill mentioned this issue Feb 24, 2022

Weekly Triage meetings: Feb-Jun 2022 jupyterlab/frontends-team-compass#137

Closed

github-actions bot added the status:resolved-locked label Aug 10, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run cells in different threads #1155

Run cells in different threads #1155

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Carreau commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

JamiesHQ commented Apr 27, 2017

Phyks commented May 2, 2017

micahscopes commented Sep 17, 2017 •

edited

Loading

psychemedia commented Nov 15, 2017

micahscopes commented Nov 16, 2017 via email

micahscopes commented Nov 19, 2017

dmvieira commented Dec 19, 2017

dmvieira commented Dec 21, 2017

shijianjian commented Mar 11, 2020

Run cells in different threads #1155

Run cells in different threads #1155

Comments

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Carreau commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

Phyks commented Mar 1, 2016

takluyver commented Mar 1, 2016

JamiesHQ commented Apr 27, 2017

Phyks commented May 2, 2017

micahscopes commented Sep 17, 2017 • edited Loading

psychemedia commented Nov 15, 2017

micahscopes commented Nov 16, 2017 via email

micahscopes commented Nov 19, 2017

dmvieira commented Dec 19, 2017

dmvieira commented Dec 21, 2017

shijianjian commented Mar 11, 2020

micahscopes commented Sep 17, 2017 •

edited

Loading