Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify notebook file being run #1000

Open
aggFTW opened this issue Jan 26, 2016 · 31 comments
Open

Identify notebook file being run #1000

aggFTW opened this issue Jan 26, 2016 · 31 comments
Milestone

Comments

@aggFTW
Copy link

aggFTW commented Jan 26, 2016

Hi,

I've seen this type of question a lot:
http://stackoverflow.com/questions/20050927/how-to-get-the-ipython-notebook-title-associated-with-the-currently-running-ipyt?rq=1

It makes sense to me that the kernel should not know what it's talking to from a design perspective.

However, I'm currently in the process of working through a Jupyter High Availability scenario. Our goal is to have two Jupyter instances running in two different VMs and switch them if one of those two VMs go down for some reason without losing the kernel state.

We have control over the kernels we are running (see https://github.com/jupyter-incubator/sparkmagic/blob/master/remotespark/wrapperkernel/sparkkernelbase.py), and we'd like to be able to tie some state (a session number) to a particular kernel instance.

It seems to me like I'd need some things to achieve this, but maybe you have better ideas:

  • Fire some piece of code automatically every time a notebook starts: this could be the __init__ method in my kernel or some other piece of code that is triggered every time a kernel gets started (some Javascript code in the notebook maybe? I know this wouldn't apply for other clients but it's a start).
  • This previous bit of code that gets fired would need to always be run with the same ID to be able to identify the state it needs to reconstruct (i.e. it would need to know that for this particular kernel we had X particular state).
  • Some persistent storage that both Jupyter instances could have access to.

I thought of a concrete implementation and I'd like to hear some feedback on it if possible:
There is a Notebook extension that reads some ID in the notebook's page DOM (I need help knowing what ID this would be: e.g. notebook name with relative paths from root folder included or a GUID in some hidden cell in the notebook file), which would then issue a request to the kernel with this ID to restore its state. The kernel would then take this ID and get the session ID from cloud storage. If the ID is embedded in Javascript, both Jupyter servers would need to trust the notebook from the get go.

Thanks for any help or pointers you may have!
(cc. @msftristew, @MohamedElKamhawy, @ellisonbg)

@aggFTW
Copy link
Author

aggFTW commented Jan 29, 2016

cc @Carreau and @jdfreder

@minrk
Copy link
Member

minrk commented Jan 29, 2016

A custom KernelManager could add an environment variable when a kernel is started, though the KernelManager doesn't have access to the notebook path. A SessionManager could pass that down, though it wouldn't be updated when the notebook is renamed, so a filename is probably not the best key to use.

@jdfreder
Copy link
Contributor

You can put a GUID in the notebook-level metadata. I think you can do it without JS, at the web server level, on new or existing notebook load.

@jdfreder
Copy link
Contributor

--- oh, this is issue #1000 ! 🍰 🎉

@Carreau
Copy link
Member

Carreau commented Jan 29, 2016

:-P

@Carreau
Copy link
Member

Carreau commented Jan 29, 2016

Wouldn't a custom MappingKernelManager that store the various kernel-models in a shared DB we enough ? (or I miss something about the notebook name).

It is highly unlikely that the notebook would be renamed during the swap of VMs.

There might need some extra logic for clean startup/exit/restart, but that should be able to resume connections.

@msftristew
Copy link

So, I've picked up this work where @aggFTW left off. I think this is how we're thinking about doing this:

  1. Use a custom SessionManager that passes down the notebook name as an argument to the MappingKernelManager.
  2. Use a custom KernelManager that communicates the notebook name to the new kernel process on startup (through an environment variable or some other method).
  3. Our custom kernels will take the notebook name as a key and will update their metadata as appropriate in the way that @aggFTW described above.
  4. Use a custom ContentsManager to update the metadata necessary for resuming stale sessions when a method is renamed.

Item (4) will certainly be an internal extension to Jupyter for us, but we were wondering whether items (1) and (2) would have any chance of being accepted upstream. I understand that the kernel not knowing what's talking to it is part of the design, but it seems like it would be generally useful (not just for this scenario) if kernels could be made aware what the name of their notebook is either through an environment variable, a command-line argument, or a 0mq message. Do you suppose there would be any interest in that PR?

@minrk
Copy link
Member

minrk commented Feb 25, 2016

I think it is generally useful, and we should probably do it. An environment variable is the way to go, I think. The only disadvantage of that is that you cannot update the file location on rename after the kernel has started, but a zmq message updating the file doesn't seem like the right thing to do, to me.

@Carreau Carreau added this to the wishlist milestone Jun 27, 2016
@olgabot
Copy link

olgabot commented Jan 17, 2017

Was this ever resolved? I'm making output and figure folders based off of the name of the notebooks and this code works in the notebooks, but when I

from IPython.core.display import Javascript
from IPython.display import display


def get_notebook_name():
    """Returns the name of the current notebook as a string
    
    From From https://mail.scipy.org/pipermail/ipython-dev/2014-June/014096.html
    """
    display(Javascript('IPython.notebook.kernel.execute("theNotebook = " + \
    "\'"+IPython.notebook.notebook_name+"\'");'))
    return theNotebook

But when I move it into a common.py file so it can be accessed across all notebooks, I get a NameError:

image

Is this because the .py file has no notebook? Is there a way to get the .py file to recognize the notebook it is being called from?

@Carreau
Copy link
Member

Carreau commented Jan 17, 2017

display(Javascript('IPython.notebook.kernel.execute("theNotebook = " + \
"\'"+IPython.notebook.notebook_name+"\'");'))
## Here are dragons. 
return theNotebook

Handwaving:

The display javascript will take some time to reach the browser, and it will take some time execute the JS and get back to the kernel.

During this time IPython have have to continue executing code, so try to "return theNotebook" which is undefined. So it raise. even if you could "Wait for the JS to execute" you could not set the name of the notebook before returning the function .

Does that make some sens ?

@takluyver
Copy link
Member

The JS sets the name in the main user namespace. When the function is moved into a module, it's looking in the module namespace, so it never sees that name. But that function is a hack, and I wouldn't rely on it in any case.

@natbusa
Copy link

natbusa commented Apr 27, 2017

ok, maybe this would sound silly, but would it be enough to add the ipynb filename in the metadata section of the notebook data structure when it's read? the field should not be stored in file but only updated once read in memory. - a sort of ephemeral metadata info

@natbusa
Copy link

natbusa commented Apr 27, 2017

I see it looks like the kernel is completely agnostic to the concept of file and it just processes cells data. I would say that the only options are indeed env variables or passing the filename during the creation of the kernel if any filename is available at that point.

@jordansamuels
Copy link

I may be late to the party, but if we could somehow determine just the port of the notebook server, then getting the notebook path is easy by using the REST api. The example below hardwires port 8080:

kernel_id = re.search('kernel-(.*).json', ipykernel.connect.get_connection_file()).group(1)
response = requests.get('http://127.0.0.1:{port}/api/sessions'.format(port=8080))
matching = [s for s in json.loads(response.text) if s['kernel']['id'] == kernel_id]
if matching:
    return matching[0]['notebook']['path']

But I couldn't find any way to automatically determine the port, without using the not-so-safe/useful Javascript hacks.

So, can we get the port?

@gcbeltramini
Copy link

gcbeltramini commented Jan 23, 2018

This seems to work:

import json
import os.path
import re
import ipykernel
import requests

#try:  # Python 3
#    from urllib.parse import urljoin
#except ImportError:  # Python 2
#    from urlparse import urljoin

# Alternative that works for both Python 2 and 3:
from requests.compat import urljoin

try:  # Python 3 (see Edit2 below for why this may not work in Python 2)
    from notebook.notebookapp import list_running_servers
except ImportError:  # Python 2
    import warnings
    from IPython.utils.shimmodule import ShimWarning
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ShimWarning)
        from IPython.html.notebookapp import list_running_servers


def get_notebook_name():
    """
    Return the full path of the jupyter notebook.
    """
    kernel_id = re.search('kernel-(.*).json',
                          ipykernel.connect.get_connection_file()).group(1)
    servers = list_running_servers()
    for ss in servers:
        response = requests.get(urljoin(ss['url'], 'api/sessions'),
                                params={'token': ss.get('token', '')})
        for nn in json.loads(response.text):
            if nn['kernel']['id'] == kernel_id:
                relative_path = nn['notebook']['path']
                return os.path.join(ss['notebook_dir'], relative_path)

You can put it inside a module, and import it in the jupyter notebook.

Edit: Thanks to @thesneaker, I changed the way to get the token.
Edit2: I tested in Python 2, but the Jupyter notebook couldn't import from notebook.notebookapp import list_running_servers when it was inside a module.
Edit3: Added an alternative and an observation thanks to this comment.

References:

  1. Previous comment
  2. this Stackoverflow answer
  3. this comment, especially this commit

@thesneaker
Copy link

Thanks @gcbeltramini for this pure python solution! I'm running Jupyter 4.1.0 and had to take care of the missing token key. Other than that it's the best solution I've come across so far!

I wouldn't mind if this functionality would find it's way into the notebookapp class and be the recommended way by the jupyter devs. Having easy access to the notebook name (and preferably the path) is essential to do reproducible measurements with jupyter notebooks.

@vpillac
Copy link

vpillac commented Feb 7, 2018

Not quite sure why but the response was not always json for me, I fixed it by adding a try statement:

        try:
            for nn in json.loads(response.text):
                if nn['kernel']['id'] == kernel_id:
                    relative_path = nn['notebook']['path']
                    return os.path.join(ss['notebook_dir'], relative_path)
        except:
            pass

@vpillac
Copy link

vpillac commented Feb 7, 2018

Also another useful method:

def save_notebook_to_html():
    nb_name = get_notebook_name()
    s = os.system('jupyter nbconvert --to html {notebook}'.format(notebook=nb_name))
    return s == 0

@jakirkham
Copy link
Member

This code...

try:  # Python 3
    from urllib.parse import urljoin
except ImportError:  # Python 2
    from urlparse import urljoin

try:  # Python 3
    from notebook.notebookapp import list_running_servers
except ImportError:  # Python 2
    import warnings
    from IPython.utils.shimmodule import ShimWarning
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ShimWarning)
        from IPython.html.notebookapp import list_running_servers

...can be replaced with this code and still work on Python 2/3.

from requests.compat import urljoin

from notebook.notebookapp import list_running_servers

@dclong
Copy link

dclong commented Aug 5, 2018

The code doesn't work for me in JupyterHub.

@convoliution
Copy link

convoliution commented Jun 17, 2019

Note that if you do not have the right token to query the server on the REST call,

json.loads(response.text)

may return {"message": "Forbidden", "reason": null} instead of a list of sessions, resulting in

if nn['kernel']['id'] == kernel_id:

raising TypeError: string indices must be integers

@DBCerigo
Copy link

Note that the solution above won't work when executing a nb via jupyter nbconvert --to notebook --execute mynotebook.ipynb or via from nbconvert.preprocessors import ExecutePreprocessor from within a python script, as (of course?!) there's no server running to query.

@elgalu
Copy link
Contributor

elgalu commented Jul 26, 2019

How to achieve this with the latest versions?

@billallen256
Copy link

Could the ipyparams package work for this? It can return the notebook file name as well as any query string parameters passed in the URL.

@elgalu
Copy link
Contributor

elgalu commented Dec 18, 2019

It's seems to be unreliable @gershwinlabs , sometimes ipyparams.raw_url comes back as an empty string, seems to be related to the reliance on JavaScript, some sort of race condition.

@billallen256
Copy link

@elgalu I can't seem to reproduce the problem. Can you tell me more about your environment and notebook? I don't think it's possible to get away from the reliance on Javascript given the deliberate separation between the front and back ends.

@thorade
Copy link

thorade commented Mar 19, 2020

@billallen256
Copy link

Thanks @thorade. I posted an answer with ipyparams.

@jakirkham
Copy link
Member

Maybe issues with ipyparams can be raised against that repo? 😉

@Ismar11
Copy link

Ismar11 commented Apr 16, 2020

Does anyone know if there is a command line argument under jupyter notebook list or a similar feature to get notebook names running in each server from console directly?

If it doesn't exist, it's not planned or the question is out of the scope of this issue, I could open a new one and describe in detail with examples/ideas. Let me know :)

Carreau added a commit to Carreau/jupyter_client that referenced this issue Jun 7, 2021
This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_ASSOCIATED_FILE env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

    --- a/jupyter_server/services/sessions/sessionmanager.py
    +++ b/jupyter_server/services/sessions/sessionmanager.py
    @@ -96,7 +96,12 @@ class SessionManager(LoggingConfigurable):
             """Start a new kernel for a given session."""
             # allow contents manager to specify kernels cwd
             kernel_path = self.contents_manager.get_kernel_path(path=path)
    -        kernel_id = await self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
    +
    +        kernel_id = await self.kernel_manager.start_kernel(
    +            path=kernel_path, kernel_name=kernel_name, associated_file=name
    +        )
             return kernel_id

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.
Carreau added a commit to Carreau/jupyter_client that referenced this issue Jun 22, 2021
This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

    --- a/jupyter_server/services/sessions/sessionmanager.py
    +++ b/jupyter_server/services/sessions/sessionmanager.py
    @@ -96,7 +96,12 @@ class SessionManager(LoggingConfigurable):
             """Start a new kernel for a given session."""
             # allow contents manager to specify kernels cwd
             kernel_path = self.contents_manager.get_kernel_path(path=path)
    -        kernel_id = await self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
    +
    +        kernel_id = await self.kernel_manager.start_kernel(
    +            path=kernel_path, kernel_name=kernel_name, session_name=name
    +        )
             return kernel_id

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.

Commit ammended – originally the name was associated_file, and
JPY_ASSOCIATED_FILE, but was changed.
Carreau added a commit to Carreau/jupyter_client that referenced this issue Sep 14, 2021
This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

```diff
diff --git a/notebook/services/sessions/sessionmanager.py b/notebook/services/sessions/sessionmanager.py
index 92b2a7345..f7b4011ce 100644
--- a/notebook/services/sessions/sessionmanager.py
+++ b/notebook/services/sessions/sessionmanager.py
@@ -108,7 +108,9 @@ class SessionManager(LoggingConfigurable):
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
         kernel_id = yield maybe_future(
-            self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+            self.kernel_manager.start_kernel(
+                path=kernel_path, kernel_name=kernel_name, session_name=path
+            )
         )
         # py2-compat
         raise gen.Return(kernel_id)
```diff

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.
Carreau added a commit to Carreau/jupyter_client that referenced this issue Sep 14, 2021
This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

```diff
diff --git a/notebook/services/sessions/sessionmanager.py b/notebook/services/sessions/sessionmanager.py
index 92b2a7345..f7b4011ce 100644
--- a/notebook/services/sessions/sessionmanager.py
+++ b/notebook/services/sessions/sessionmanager.py
@@ -108,7 +108,9 @@ class SessionManager(LoggingConfigurable):
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
         kernel_id = yield maybe_future(
-            self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+            self.kernel_manager.start_kernel(
+                path=kernel_path, kernel_name=kernel_name, session_name=path
+            )
         )
         # py2-compat
         raise gen.Return(kernel_id)
```diff

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.
Carreau added a commit to Carreau/jupyter_client that referenced this issue Sep 14, 2021
This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

```diff
diff --git a/notebook/services/sessions/sessionmanager.py b/notebook/services/sessions/sessionmanager.py
index 92b2a7345..f7b4011ce 100644
--- a/notebook/services/sessions/sessionmanager.py
+++ b/notebook/services/sessions/sessionmanager.py
@@ -108,7 +108,9 @@ class SessionManager(LoggingConfigurable):
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
         kernel_id = yield maybe_future(
-            self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+            self.kernel_manager.start_kernel(
+                path=kernel_path, kernel_name=kernel_name, session_name=path
+            )
         )
         # py2-compat
         raise gen.Return(kernel_id)
```diff

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.
@cono
Copy link

cono commented Feb 6, 2022

This looks hackish to me:

    kernel_id = re.search('kernel-(.*).json',
                          ipykernel.connect.get_connection_file()).group(1)

is there any simpler way to get id?

Was trying to look into the code, and coulnd't find where id is in Kernel. connection_file created as os.getpid():

    def init_connection_file(self):
        if not self.connection_file:
            self.connection_file = "kernel-%s.json"%os.getpid()
        try:
            self.connection_file = filefind(self.connection_file, ['.', self.connection_dir])
        except OSError:

Or probably I'm looking into the wrong place. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests