Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep notebook running after the browser tab closed #1647

Closed
ibigquant opened this issue Aug 1, 2016 · 69 comments
Closed

keep notebook running after the browser tab closed #1647

ibigquant opened this issue Aug 1, 2016 · 69 comments

Comments

@ibigquant
Copy link

My experiment may run long time (hours). It seems the notebook stop running after the browser tab closed. How to keep it running and updating the notebook?

@Carreau Carreau added this to the no action milestone Aug 1, 2016
@Carreau
Copy link
Member

Carreau commented Aug 1, 2016

My experiment may run long time (hours). It seems the notebook stop running after the browser tab closed. How to keep it running and updating the notebook?

Unfortunately there are no current simple way to do that. We are aware of the issue and working on it. In the meantime, I would suggest wrapping all the computation you are doing in Futures, in order to query for results only interactively.

CLosing as this is already tracked in many places, but feel free to continue asking questions.

@Carreau Carreau closed this as completed Aug 1, 2016
@ibigquant
Copy link
Author

Thanks for you reply, Carreau. I'm fresh to python notebook and not quite understand the "Futures" you mentioned. Could you give me a simple example? Great great thanks.

@takluyver
Copy link
Member

A future is an object representing a task - it provides a way to see if the task is done, and get the result (or error) when it's finished. They're a general concept, but Python provides an implementation in concurrent.futures. They're normally used in code that's doing more than one thing at once.

I think that's probably more complex than you need, though. A cell that you've started running will keep going when you close the browser tab, but the output it produces is lost. The easiest workaround is just to leave the browser tab open - tabs are cheap, I've got ~50 open now. If you can't do that for some reason, make sure it assigns any results you want to keep to a variable - they should still be available when you open it again. You can also use the %capture magic to store printed output into a variable you can get later.

@flaviostutz
Copy link

I am struggling with this issue as well for some time now. The kernel keeps running your job on the server, but there is no way to see the console output after closing the browser.

My workaround was to write all my logs to a file, so that when my browser closes (indeed when a lot of logs come through browser it hangs up too) I can see the kernel job process by opening the log file (the log file can be open using Jupyter too).

    #!/usr/bin/python
    import time
    import datetime
    import logging
    
    logger = logging.getLogger()
    
    def setup_file_logger(log_file):
        hdlr = logging.FileHandler(log_file)
        formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
        hdlr.setFormatter(formatter)
        logger.addHandler(hdlr) 
        logger.setLevel(logging.INFO)
    
    def log(message):
        #outputs to Jupyter console
        print('{} {}'.format(datetime.datetime.now(), message))
        #outputs to file
        logger.info(message)
    
    setup_file_logger('out.log')
    
    for i in range(10000):
        log('Doing hard work here i=' + str(i))
        log('Taking a nap now...')
        time.sleep(1000)

+1 on this or some kind of long running process management

@abalter
Copy link

abalter commented Mar 28, 2017

I'm confused why this is difficult. Since a serialized jupyter notebook contains cell output, it should be possible to keep track of output when a user closes a tab and returns to the notebook by adding to the notebook json as it runs in the background, in which case the output generated while running in the background would be in the notebook. Why can't jupyter just keep writing to the json file?

@takluyver
Copy link
Member

It doesn't write to the JSON files as soon as output is sent by the kernel - it is sent to the browser, which adds it in to the notebook document. When you save it (or an autosave occurs), the notebook document is converted to JSON and written to disk as a whole.

We're planning to change that so that the server keeps the notebook model and sends updates to the browser, but that's a big change to the architecture.

@abalter
Copy link

abalter commented Mar 31, 2017

That would be great! Is there an current issue or milestone where I can track the progress?

@takluyver
Copy link
Member

I don't know of one - @Carreau might be able to give you more info on the progress.

@abalter
Copy link

abalter commented Apr 1, 2017

That would be great! My group works on remote servers. So being able to reconnect to a session would be very valuable.

@prolearner
Copy link

prolearner commented Apr 3, 2017

I'm working on remote servers too. It would be really handy to be able to do this, hope it'll be implemented soon.

As a suggestion, I think that having the possibility to reconnect to a session even if that means to lose all the output when you weren't connected but having the possibility to save the new output would be great and more simple to implement. That way if you're working on a remote server and you have a network disconnection you can still continue the work with little loss.

@Carreau
Copy link
Member

Carreau commented Apr 3, 2017

I don't know of one - @Carreau might be able to give you more info on the progress.

None AFAICT from the Notebook/Lab side. nteract might be closer with commuter. That's probably not going to be implemented "soon". Realtime will likely come sooner but will require a running browser.

@flying-sheep
Copy link
Contributor

Closing as this is already tracked in many places, but feel free to continue asking questions.

so where are the open issues for this? there’s still issues being opened about this (e.g. #2446) and i can’t find the earlier, open ones.

@k0pernicus
Copy link

Any news about this issue please?

@arvoelke
Copy link

arvoelke commented Jul 4, 2017

The easiest workaround is just to leave the browser tab open

This doesn't help if you are on a flaky connection to the server (e.g., accessing a remote jupyter server, or tunnelling to one through SSH).

@Carreau
Copy link
Member

Carreau commented Jul 5, 2017

Any news about this issue please?

We are aware of the issues there is not much written about it – we should get a comprehensive document about that – but this need a significant reactor of the frontend, plus likely some change in the backend. CoCalc (ex SagemathCloud) does allow that but ou need a server side model, and basically you deprecate all the extension for a given frontend – which is easy for cocalc as it is without extensions.

Though it is indirectly moving forward via Jupyterlaband nteract Comutable, and once this is out we can likely start to think about an isomorphicJsApp that keep state and the browser being only a "view" on this state.

My personal opinion is that this can be done without changes to the protocol as a separate app and anyone is welcomed to chime in, and write up a IPep/RFC/Prototype that lays out the ground infrastructure.

It is a significant enough amount of work that we can't just do that "on the side", and will need at least a FTE to do that.

@flying-sheep
Copy link
Contributor

flying-sheep commented Jul 5, 2017

likely some change in the backend

from my understanding, the frontend runs in the browser. so if no tab is open, there is no frontend and there definitely need to be changes in the backend. or do you mean different parts than me?

architecturally, i’d assume that the notebook server needs to start writing responses to the notebook file as long as there’s no browser tab attached. (i.e. instead of receiving the responses in a browser tab and manually saving the notebook, it gets saved automatically after any (batch of) responses)

@Carreau
Copy link
Member

Carreau commented Jul 5, 2017

from my understanding, the frontend runs in the browser. so if no tab is open, there is no frontend and there definitely need to be changes in the backend. or do you mean different parts than me?

You need to move some pieces from frontend to backend. it likely can be done with a "proxy server" in between notebook server and browser.

architecturally, i’d assume that the notebook server needs to start writing responses to the notebook file as long as there’s no browser tab attached. (i.e. instead of receiving the responses in a browser tab and manually saving the notebook, it gets saved automatically after any (batch of) responses)

Yes and no. The notebook file does not – and cannot – store all the necessary information especially while the kernel is still running (for example mapping from message-id to handlers. You need an extra store (that can be in server RAM) and has a richer representation than ipynb. If you have that, then the frontend need to understand this as well which start o be complicated.

@kostrykin
Copy link

kostrykin commented Jul 20, 2017

You need to move some pieces from frontend to backend. it likely can be done with a "proxy server" in between notebook server and browser.

@Carreau By "proxy server", you actually mean something like an off-screen browser, right? I'm quite not sure how the interaction of your actual browser and that off-screen-proxy-thing should look like. Do you have any knowledge of any peace of software which can do that? Maybe, a browser which itself renders its interface as HTML and provides it via HTTP?

@Carreau
Copy link
Member

Carreau commented Jul 20, 2017

@Carreau By "proxy server", you actually mean something like an off-screen browser, right

No, not completely. Browser like imply HTML and rendering. You can store non-html models on the proxy-server-side. I only care about the ipynb+some-info on the server side. The rendering is a detail. The point is the "state" you care about – which is not the HTML rendering should live and be able to be updated without needing to have an open browser. Thing of Google Drive RT API if you wish.

I've seen things (ex mozilla tow truck I think) trying to do that with HTML. Any isomorphic App these days does similar things.

@idning
Copy link

idning commented Dec 12, 2017

do we have any update on this, this should be essential for the cloud use case.

@set92
Copy link

set92 commented Dec 12, 2017

If you check the 2nd reference, which says "This is an intended outcome of the notebook model refactor.". So we will get it in jupyterlab, althought reading it I'm thinking it will save the results but it would not let us open again a closed notebook to keep working on it, or check the results after letting it working in background.

@idning
Copy link

idning commented Dec 12, 2017

Is that any hack we can do ?

e.g. assign the output of each cell to a internal variable, and when we re-connect the kernel, get these variables and display them.

@minrk
Copy link
Member

minrk commented Dec 20, 2017

@idning yes, storing results and outputs in variables continues to work. You can redisplay variables still in memory at any time.

x = long_computation()
... some time later:
display(x)

You can also capture displayed outputs (not results) with the %%capture cell magic:

%%capture out_x
print("lots of stuff")
...
# another cell
out_x.show()

However, if it really is a long-running computation, avoiding recomputing even when the kernel dies is probably useful. In that case, using a caching scheme such that you write intermediate results to disk and only re-execute if the cache doesn't exist on disk is preferable. This is what I have done for my long-running notebooks long ago. In this way, re-running a whole notebook after it's run once, even with a new kernel, may only take a few seconds and will produce all of the original output. There are lots of ways to do this with different tradeoffs of rigor vs cache performance, which is part of why there isn't a simple example to link to.

Yet another option is to run the notebook entirely headless with nbconvert:

jupyter nbconvert --execute --to notebook mynotebook.ipynb

which will create a new notebook with all of the output in tact.

@rasbt
Copy link

rasbt commented Feb 1, 2018

I think the typical use case for that is when running longer computations in the notebook; here, it's important to keep in mind that nbconvert is not very generous with the default timeout limit per cell. E.g., for longer computations, one might want to provide a custom timeout limit, e.g., for computations that run for a day, something as follows:

jupyter nbconvert --execute --to notebook mynotebook.ipynb --ExecutePreprocessor.timeout=86400

@abalter
Copy link

abalter commented Feb 1, 2018

I have a really hard time understanding why this is a problem. Basically, whatever would be sent to the browser is instead written to a file. When the user logs back in, send it to the browser.

@wernight
Copy link

wernight commented Aug 7, 2018

SGTM. I've even a Dockerizer PhantomJS if you're interested: https://hub.docker.com/r/wernight/phantomjs/

@abalter
Copy link

abalter commented Sep 25, 2018

@takluyver

Something like that is now implemented - messages go into a buffer when there's no client connected, and are replayed when one reconnects. But the details are never as simple as they seem.

I'm not trying to be obnoxious, but tell me where my thinking is wrong here:

Typical web application function:

  1. server receives request from client (client-side application)
  2. server creates response
  3. server sends response to client (essentially passes it a stream)
  4. loop until tab is closed

Hypothetical way jupyter web app functions:

  1. jupyter server receives request from client (jupyter notebook) due to user input
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream)
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. loop until tab is closed

Suppose user does not interact with notebook

  1. jupyter server receives request from client (jupyter notebook)
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream)
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. client responds that message was received

Suppose tab is currently closed

  1. jupyter server receives request from client (jupyter notebook)
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream)
      AND
      writes response to a file
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. client responds that message was received

Suppose tab is reopened

  1. jupyter server send cached stream to notebook
  2. client responds that message was received
  3. client displays output
  4. jupyter server resumes normal operation

I can't emphasize enough how important this is to our workflow and that of many others.

This is a MAJOR shortcoming of Jupyter compared to RStudio Server and should be a top priority.

@tanmay-kulkarni
Copy link

tanmay-kulkarni commented Sep 26, 2018

This probably has been said several hundred times already, but once again, I wish to request the kind developers of this project to take this issue on priority.

It's baffling to me how such a basic necessity has not been taken care of for so long. I mean, most jobs with large amounts of data take several hours to run, at the least, on a remote server. I'd have thought this feature is included by default. I was surprised when I kept my server running overnight, logged in and saw that no output was stored. I even couldn't tell which cell was currently executing since all the cells had a blank instead of an * which is there when a cell is running.

EDIT: I'd like to add that I realize Jupyter is free software and the developers have other commitments too and only so much time, but I love Jupyter and this feature would make life easier for so many people. Thanks in advance ;)

@Carreau
Copy link
Member

Carreau commented Sep 26, 2018

To the risk of also repeating ourselves one more time.

Jupyter is mostly developed by people on their free time, and is given away for free. We do suffer the same bugs and annoyance than you do. We prioritize what we can prioritize, and even for those of us who are allowed to contribute to Jupyter professionally, it is 1) not always their main occupation, 2) have often tasks that are assigned by management or higher ups.

We don't owe features to users, even if we do care, but we do have obligations to finish the projects for which non-profit that gave us money – at least for those of us employed totally or partially via these funds.

We cannot – and will not try – to force volunteers to prioritize what they wish to work on. We can try to lead by example and hope this foster collaboration.

It is not because this issue is still open that people are not working on it. We already added a band aid by replaying messages, and there is significant work that is currently done on this front, in part with JupyterLab with a server-side model and CRDT.

It is extremely difficult work, especially if you can't spend several hours focused on it, which not many of us can afford.

So if you wish for this work to go faster, please do not insult us, shout on us (or write bold on the internet which is equivalent) and find ways to help, even indirectly.

There are many ways you can do so even if you are not a genius coder:

Convince your company/institution/government to donate to numfocus

This will allow us to hire people to work full time with a correct living wages ! If we get even more money we could even hire talents that otherwise cross the street to get their salary double, triple or sometime more than quintupled.

Convince your company/institution/government to contribute time

Ask if you (or someone else) would be allowed to spend 1 afternoon per month helping. If Jupyter is used at your work, your company likely would gain in having an expert, and fixing things upstream. We also have plenty of things that are not code related where we need help (legal, design, event planning...),

Respond to issues on mailing list, help triage.

You will free us time ! Not having to respond to easy issue allow us to sometime get 1 or 2 hours strait where we can attempt difficult work.

Contribute code on your free time

Getting familiar with even small issues will increase your knowledge of the codebase, and who knows after a couple of month you may commit right and can help fix long standing issues like this one. You sometime don't even have to start from scratch, there are many PR that some of us started, but need to polish (fix test, rebase, document..), with the nice decentralized github you can propose fixes to existing PRs !

Help manage the community

Twitter, GitHub, Facebook, Youtube, mailing list, Proof-read our blog, being friendly and remind people to be respectful to each other.

We are sorry if you are encountering issues, or if you have lost work, but please don't us that as a excuse to suggest that we don't care, are incompetent, haven't thought about how to fix it, how to implement it, and how to not break backward compatibility.

Many thanks, and much love from the Jupyter team, doing their best.

Also nice reads are Why I took October off from OSS volunteering and Setting expectations for open source participation from Brett Cannon

@tanmay-kulkarni
Copy link

We don't owe features to users, even if we do care, but we do have obligations to finish the projects for which non-profit that gave us money – at least for those of us employed totally or partially via these funds.

@Carreau You probably didn't read the edit in my comment above. Please read it. I realize Jupyter is free and the developers don't owe us anything. It's just a request.

So if you wish for this work to go faster, please do not insult us

I'm sorry if I gave offense. It was totally not my intention. I have utmost respect for you and all the wonderful people who contribute to all OSS. I may have misunderstood the situation, since I read somewhere above that implementing this shouldn't be difficult. But since you have clarified that it's not easy work, I believe you.

I hope this issue is solved in near future, and once again, thanks for all your work on Jupyter. I really do appreciate it :)

@abalter
Copy link

abalter commented Sep 26, 2018

I deeply apologize for "yelling." Bad choice of typesetting. I wanted to somehow find a way to bump this up on the priority level, but that wasn't the way to do it.

Is there a way to make sure this feature makes it onto the road map?

Is there a priority system for the road map?

Until this feature is ready, it's a big reason to use RStudio Server over Jupyter, and the more people that use Jupyter, the better for the entire project.

Believe me, if I felt confident enough to dive into the code and actually make a difference, I would do that instead of spending time writing long issue comments. Maybe I need to get over that and dive in, but I fear going down a rabbit hole and being of no help to anyone.

Suppose I wanted to join up with a few more experienced programmers to look at this, what would be a way to find those others and form that group?

I think the best way I could contribute would be to help bring outside resources to the Jupyter project. Is there already a task force for that?

@blink1073
Copy link
Contributor

This is on the roadmap for JupyterLab, cf jupyter/roadmap#44 with more explicit language around this. Essentially, we need a server side representation of the notebook model, which is then rendered by the front end. Please see discussion jupyterlab/jupyterlab#2540.

@Carreau
Copy link
Member

Carreau commented Sep 26, 2018

You probably didn't read the edit in my comment above. Please read it. I realize Jupyter is free and the developers don't owe us anything. It's just a request.

Thank you, I did not in fact saw it,

I'm sorry if I gave offense. It was totally not my intention. I have utmost respect for you and all the wonderful people who contribute to all OSS. I may have misunderstood the situation, since I read somewhere above that implementing this shouldn't be difficult. But since you have clarified that it's not easy work, I believe you.

I hope this issue is solved in near future, and once again, thanks for all your work on Jupyter. I really do appreciate it :)

The response was not target toward @abalter or @tanmay-kulkarni, it just happen to have two comments in a short time and that is often a trigger that let other people also add their comments that degenerate, and was more attempting to defuse and tell people to pay attention.

I deeply apologize for "yelling." Bad choice of typesetting. I wanted to somehow find a way to bump this up on the priority level, but that wasn't the way to do it.

Written communication is hard, and many physical cue from in person conversation are not present. Thanks for clarifying your meaning.

Believe me, if I felt confident enough to dive into the code and actually make a difference, I would do that instead of spending time writing long issue comments. Maybe I need to get over that and dive in, but I fear going down a rabbit hole and being of no help to anyone.

Have a look at my first PR 7 years ago if you want to get some confidence. Some of the history is not there anymore (github didn't kept comments on code at the time). I was putting semi-colon at end of line.

Deep code diving is not the only thing that would help, we are for example exploring how to foster community:

If your contribution for a few weeks is to follow along and send a weekly summary to the advance done on this front (and other) that would be of tremendous help.

I think the best way I could contribute would be to help bring outside resources to the Jupyter project. Is there already a task force for that?

No, but do you want to try to organise this or make a proposal ?

@jasongrout
Copy link
Member

See also #641, which seems like it is the same issue.

@bainjamain
Copy link

In the meantime, a dirty hack to keep the output from a notebook while the browser is closed is to print to the console rather than the notebook using something like

import sys
sys.stdout = open('/dev/stdout', 'w')
print('this is printed in the console')

@abalter
Copy link

abalter commented Oct 16, 2018

I'm actually thinking of trying a hack when I get time. Start a headless browser in a flask app that communicates with jupyter and regurgitates its current page view to the browser. Then I would interact with the headless browser through that page. The headless browser would pass the interaction on to jupyter and capture the response.

@CristianCantoro
Copy link

CristianCantoro commented Nov 10, 2018

You can workaround to the issue for the moment running a container with a browser on the same server where you have your notebook running, and connect to it with VNC. For example docker-firefox already provides Firefox with a VNC server installation that you can expose and connect to. Of course the container with firefox will eat some resources of your server (the author of docker-firefox suggest to have at least 2GB of shared memory).

For example on a server where the user is ubuntu, I have created a folder /home/ubuntu/firefox where I put my noteboks and run the following:

docker run \
    --rm \
    --network host \
    --name=firefox \
    -v /docker/appdata/firefox:/config:rw \
    -v /home/ubuntu/firefox:/home/firefox:rw \
    -p 5900:5900 \
    --shm-size 1GB \
    jlesage/firefox

Then you can use any VNC client to connect to the browser.

There may be other solutions based on xpra (basically using the same basic idea of subuser), but probably the setup would be a little more complicated.

EDIT: one thing I should mention is that you have to be careful with the setup if the server is reachable from the internet. I have in mind a typical scenario where you are using a machine in your local cluster or local network where who can reach the machine is trusted.

@wernight
Copy link

wernight commented Nov 14, 2018 via email

@cyrildiagne
Copy link

Worth pointing out another workaround: papermill which allows running notebooks through a terminal cli and outputs results in a separate notebook:
papermill test.ipynb test-output.ipynb

This combined with tmux seems to be a viable workflow for remote running notebooks.

@NimSed
Copy link

NimSed commented Feb 25, 2019

This might not be exactly related to the asked question, but still should be a useful trick.
I call my lengthy text-output-generating scripts in google colab like this:
%run 'main.py'

Up until today whenever my laptop went to sleep (i.e. got disconnected from net), the process would die and become irretrievable.
Today almost randomly I found that having another dummy cell in the same notebook and running it after resuming, not only resurrects the main cell's running process, but also retrieves and displays the text output that was cut right after the apparent death of the process!

The dummy cell can be as simple as:
!ls

@RobertoFranceschini
Copy link

Is it a sensible idea to let our long running cells in a queue system such as ray? Turning a regular function into a remote functions is not hard at all, it's just one decorator to add ...

@naglemi
Copy link

naglemi commented Jul 27, 2019

I just wanted to share a simple approach that I've found satisfactory for watching outputs after exiting out of a browser tab...

This works when print functions are being used to produce output. I define this function and then replace print statements with jprint.

jprint <- function(ObjectToPrint){
    print(ObjectToPrint) # so we still get outputs in browser tab before closing
    write.table(ObjectToPrint, "~/jupyter_outputs.txt", append=TRUE,
                col.names = FALSE, row.names = FALSE)
}

The jupyter_outputs.txt file must already exist.

Then if I exit out of the browser tab and wish to start watching the output again, I log in via ssh and run something like:
watch tail -n 50 ~/jupyter_outputs.txt

@georgethrax
Copy link

A future is an object representing a task - it provides a way to see if the task is done, and get the result (or error) when it's finished. They're a general concept, but Python provides an implementation in concurrent.futures. They're normally used in code that's doing more than one thing at once.

I think that's probably more complex than you need, though. A cell that you've started running will keep going when you close the browser tab, but the output it produces is lost. The easiest workaround is just to leave the browser tab open - tabs are cheap, I've got ~50 open now. If you can't do that for some reason, make sure it assigns any results you want to keep to a variable - they should still be available when you open it again. You can also use the %capture magic to store printed output into a variable you can get later.

This could be a nice strategy

@Sohnia
Copy link

Sohnia commented Jul 16, 2020

Why can Google Colab make the output cell updating after you closed your browser?

@wangyunpengbio
Copy link

Why can Google Colab make the output cell updating after you closed your browser?

Google miracle!!
I hope the new version of jupyter also has this function

@mzouink
Copy link

mzouink commented Sep 13, 2020

No news yet ?

@cguess
Copy link

cguess commented Mar 31, 2021

Has updating the output been considered at all here?

@jupyter jupyter locked as resolved and limited conversation to collaborators Mar 31, 2021
@Zsailer
Copy link
Member

Zsailer commented Feb 3, 2022

Duplicate of #641.

@Zsailer Zsailer closed this as completed Feb 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests