Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Papermill hangs if kernel is killed #125

Closed
MSeal opened this issue Mar 28, 2018 · 5 comments
Closed

Papermill hangs if kernel is killed #125

MSeal opened this issue Mar 28, 2018 · 5 comments

Comments

@MSeal
Copy link
Member

MSeal commented Mar 28, 2018

Today papermill is hanging when kernels get OOM killed instead of returning with an error status code within some reasonable timeframe.

Steps to reproduce:

  • Make a notebook which sleeps indefinitely.
  • Call papermill on the notebook
  • Kill -9 the kernel process
  • See papermill not exit (ever)
@betatim
Copy link
Member

betatim commented Mar 28, 2018

Can we add a thread that receives the kernels heartbeat and raises an exception when it stops receiving it?

An alternative is to have a timeout of N minutes of nothing happening. I don't like that option because you need to set it to (very) large numbers for notebooks doing "lots of stuff" and often I don't know what to set the timeout to. (For scikit-optimize we kept having this problem because travis nodes are a lot slower than laptops so tuning the timeout was tedious because you had to run it on travis...)

@rgbkrk
Copy link
Member

rgbkrk commented Mar 28, 2018

I definitely don't want a timeout as we should be able to have very long running jobs. Is there something on the nbconvert side or jupyter client side that could be catching more process level events or if we need to track the pid of the kernel process more directly?

@MSeal
Copy link
Member Author

MSeal commented Mar 29, 2018

I was thinking we want is to get the id of the kernel process and just monitor that the pid is still alive.

@betatim
Copy link
Member

betatim commented Apr 8, 2018

jupyter/nbconvert#791 a discussion about advantages and disadvantages of having the default timeout in nbconvert. Might be interesting to observe what people say.

@MSeal
Copy link
Member Author

MSeal commented Apr 25, 2019

Solved in nbconvert 5.5 which is shipping tomorrow.

@MSeal MSeal closed this as completed Apr 25, 2019
nph4rd added a commit to nph4rd/PySyft that referenced this issue Dec 18, 2019
nbconvert seems to be causing problems due to it being an outdated
version.

See: nteract/papermill#125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants