-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Papermill hangs if kernel is killed #125
Comments
Can we add a thread that receives the kernels heartbeat and raises an exception when it stops receiving it? An alternative is to have a timeout of N minutes of nothing happening. I don't like that option because you need to set it to (very) large numbers for notebooks doing "lots of stuff" and often I don't know what to set the timeout to. (For scikit-optimize we kept having this problem because travis nodes are a lot slower than laptops so tuning the timeout was tedious because you had to run it on travis...) |
I definitely don't want a timeout as we should be able to have very long running jobs. Is there something on the nbconvert side or jupyter client side that could be catching more process level events or if we need to track the pid of the kernel process more directly? |
I was thinking we want is to get the id of the kernel process and just monitor that the pid is still alive. |
jupyter/nbconvert#791 a discussion about advantages and disadvantages of having the default timeout in nbconvert. Might be interesting to observe what people say. |
Solved in nbconvert 5.5 which is shipping tomorrow. |
nbconvert seems to be causing problems due to it being an outdated version. See: nteract/papermill#125
Today papermill is hanging when kernels get OOM killed instead of returning with an error status code within some reasonable timeframe.
Steps to reproduce:
The text was updated successfully, but these errors were encountered: