Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run cells in different threads #1155

Closed
Phyks opened this issue Mar 1, 2016 · 17 comments
Closed

Run cells in different threads #1155

Phyks opened this issue Mar 1, 2016 · 17 comments

Comments

@Phyks
Copy link

Phyks commented Mar 1, 2016

Hi,

Sometimes I run long code in some cell, and still want to be able to run small snippets (independant) in another cell. But I cannot because the cells are ran sequentially on the same kernel.

Maybe the execution of each cell could be threaded, and then this would be possible. I know there could be issues with the GIL, but in most cases it could work I think. Typical use case would be to perform multiple (independant) long computations in parallel, in different cells, without having to deal with subprocess and so on, which would be super user-friendly.

The ideal use case, but which would be a lot more difficult to implement, is to be able to perform a long-running computation in a cell, and start to study the results. For instance, if a cell fills a list with data, it could be useful to start plotting and reading the list elements in another cell, while the computation is still running.

I did not find many references to this, except a SO thread.

Thanks

@takluyver
Copy link
Member

Someone could write a kernel, or an IPython extension, to do things like that. However, threads and shared memory bring up a whole host of issues (not just the GIL), and I don't think we have any plans to do anything like that in the project.

@takluyver takluyver added this to the not notebook milestone Mar 1, 2016
@Phyks
Copy link
Author

Phyks commented Mar 1, 2016

Ok, I understand this may not fit in development plans for Jupyter. I would still think it could be useful, maybe as part of a separate kernel or iPython extension.

Actually, my typical use case for Jupyter + iPython kernel is to run scientific computation. I am then looking forward having some features made easily accessible:

  • Distributed computation, to dispatch evaluation of notebooks or cells on different kernels / machines, without having to think about it.
  • A backup of the state of the notebook, to be able to recover from a kernel failure (when running out of memory for instance) and avoid having to backup data on my own.
  • As part of it, some abstraction from the low-level interfaces and stuff like threads, to write code more easily and quickly.

I am not sure if this is the typical use case for Jupyter, but some of the features are already implemented, with various stability, either in Jupyter or in extensions. I am wondering how much people would be interested in such features, especially as the one described in this issue.

Concerning this issue particularly, I might have a look at kernels or writing iPython extension, if it can be of interest. I am particularly aware that threads and sharing memory is an open door for many additionnal issues, but some basic implementation may be doable, in my opinion. Especially if we either restrict this feature to very specific cases, in which we are sure there are no side effects, or let the user explicitly turn on the feature (in which case, he is responsible for it). Moreover, maybe it would be easier to do with some kernels rather than others (thinking in Julia for instance).

@takluyver
Copy link
Member

There's an IPython project ipyparallel to control multiple engines, but distributing computation without the user having to think about it is a hard problem. If you're interested in that area, have a look at dask.

There's a module called dill which can save your variables and things - it's an extension of Python's standard pickle module. It still can't handle everything, but it can do quite a lot. Another approach you can look into is checkpoint-restart, which saves an entire process to a file. Here's a presentation from a couple of years ago about doing this in Python: http://conference.scipy.org/proceedings/scipy2013/pdfs/arya.pdf

@Carreau
Copy link
Member

Carreau commented Mar 1, 2016

Have a look also at https://github.com/dask/distributed you can get some nice introduction on matt's blog.

You might also want to look at https://github.com/cloudpipe/cloudpickle beyond dill which can serialize more some object dill cannot.

The parallel computation is definitively not a Jupyter feature but a Python feature, and the way Python works it will be relatively hard to make it work magically.

The advantage of using things like Dask/Distributed/.. is also that it will work on non-jupyter environment, which is nice.

If you want to dive into the IPython kernel, we'll be happy to guide you and get feedback from API/Docs...

@Phyks
Copy link
Author

Phyks commented Mar 1, 2016

Thanks for all the links and pointers to doc and modules! I already knew about dill which we discussed in another issue (or on the mailing-list, I am not sure at the moment). Will have a look at cloudpickle as well.

The ability to run in a non-jupyter environment is indeed really nice. My idea, and the reason I posted on Jupyter/notebook is that I think it would be really awesome to have something well integrated and packaged. One of the major feature of Jupyter notebook and iPython kernel is that it "just works", and gives a really user-friendly setup for advanced tasks, out of the box :)

I think I will try to see what I can get from assembling all of this, and if it could be worth integrating further in Jupyter, via extensions or custom kernels.

@takluyver
Copy link
Member

While we want Jupyter & IPython to be usable and useful straight out of the box, they're never going to do everything you could want. There's a big ecosystem of different tools out there, and we don't want to try to subsume that all into Jupyter.

@Phyks
Copy link
Author

Phyks commented Mar 1, 2016

Sure, but hopefully it could be made as easy as the current matplotlib integration: pip install matplotlib and %matplotlib notebook :)

EDIT: Maybe this discussion should move to the mailing-list or similar, as it is now stated that this is a "not notebook" issue?

@takluyver
Copy link
Member

Technically that kind of integration is easy enough to do - it's working out what interface makes sense that's hard.

Venue: up to you, there's no particular problem with discussing it here. I set that milestone just because I don't think there's a specific notebook related issue to be fixed.

@JamiesHQ
Copy link
Member

@Phyks : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Were you able to find a solution to your issue? Please let us know so we can close this one. thanks!

@Phyks
Copy link
Author

Phyks commented May 2, 2017

Hi @JamiesHQ,

Sorry I have been busy lately and did not advance much on this issue. I will post any working solution I have for sure, if I get some.

@micahscopes
Copy link

micahscopes commented Sep 17, 2017

FYI, I've been able to do some basic multithreading in Jupyter notebooks by subclassing multiprocessing.Process with ipywidgets for feedback. It works pretty well! In the future, I might use button widgets for spawning and stopping processes. I'm actually using this to run a flask server that serves a REST api for some data that's processed in parallel. I wanted to be able to use my Jupyter notebook to serve analyses in a way that could be used outside of Python. The flask process and the data analyzer are each running in their own Process subclasses and are sharing data via Manager objects. Using this system, I can start and stop new analyzers for the flask process to serve, all from the same notebook. It's pretty nice!

@psychemedia
Copy link

@micahscopes Have you posted an example of your basic multithreading notebook recipe anywhere?

@micahscopes
Copy link

micahscopes commented Nov 16, 2017 via email

@micahscopes
Copy link

@psychemedia
Over the last few days, I extended that gist into a python package!

Try it out: https://github.com/micahscopes/nbmultitask/blob/master/examples.ipynb

image

@dmvieira
Copy link

Thx @micahscopes !

I'll try It!

@dmvieira
Copy link

It's very good! I'm doing a wrapper for spark on it: https://github.com/databootcampbr/nbthread-spark

@shijianjian
Copy link

Is there any more progress on this? I am interested as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants