Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --check-metadata to CLI script #146

Closed
goerz opened this issue Jan 1, 2019 · 13 comments
Closed

Add --check-metadata to CLI script #146

goerz opened this issue Jan 1, 2019 · 13 comments
Milestone

Comments

@goerz
Copy link

goerz commented Jan 1, 2019

It would be helpful if the jupytext CLI had an option --check-metadata. When passed, the program prints the content of the "jupytext" metadata to STDOUT (in json format) and exits. The option should further take values to filter down to a particular field, which then should be printed as an unquoted value.

For example, assume notebook.ipynb contains the following:

"metadata": {
 "jupytext": {
  "main_language": "python",
  "text_representation": {
   "extension": ".md",
   "format_name": "markdown",
   "format_version": "1.0",
   "jupytext_version": "0.8.6"
  }
 }
}

Then I would like

> jupytext --check-metadata notebook.ipynb
{
  "main_language": "python",
  "text_representation": {
   "extension": ".md",
   "format_name": "markdown",
   "format_version": "1.0",
   "jupytext_version": "0.8.6"
  }
}

> jupytext --check-metadata main_language notebook.ipynb
python

> jupytext --check-metadata text_representation notebook.ipynb
{
   "extension": ".md",
   "format_name": "markdown",
   "format_version": "1.0",
   "jupytext_version": "0.8.6"
}

> jupytext --check-metadata text_representation:extension notebook.ipynb
.md

If notebook.ipynb does not contain the "jupytext" metadata, nothing should be printed.

This would be a tremendous help for jupytext.vim plugin. Basically, what I want to achieve is for vim notebook.ipynb, where notebook.ipynb has jupytext metadata, to use the "format_name" from that metadata instead of the default g:jupytext_fmt from the .vimrc. Only if the notebook does not have metadata would I fall back to the g:jupytext_fmt.

Otherwise, I think that using the plugin on ipynb files that already have a linked text representation would potentially be a problem: If the setting g:jupytext_fmt does not match the format of the linked file, an additional text-representation file would be written (and deleted when vim closes), and the metadata in notebook.ipynb would be overwritten to point to the new (now missing) file. Thus, the link between the notebook and the original text-representation files would be severed. I don't think people who use the automatic linking feature through c.ContentsManager would like that at all.

Parsing out the metadata from inside vimscript will be quite difficult, I think. I could do it in Python, but then I would have to assume that vim has been compiled with python support, which I would like to avoid. Similarly using an external utility like jq. The one dependency I can guarantee in the plugin is that the jupytext CLI script is available (otherwise, the plugin is pointless in the first place). Thus, it would be easiest if jupytext could handle the extraction of the necessary data.

@mwouts: Let me know if you think this is ok. Of course, I don't really care if the flag is called --check-metadata or something else). If this is not extremely trivial for you to implement, maybe I can have a crack at it and make a pull request.

@mwouts
Copy link
Owner

mwouts commented Jan 2, 2019

Hello @goerz , thanks for your suggestion. I completely agree that the pairing information would be very helpful for the vim plugin, and that the notebook pairing, when it exists, could have precedence over the default vim plugin format. Jupytext CLI could certainly have a mode that gives that information. I am not completely sure however that jupytext should be the program that returns the complete notebook metadata, but we can discuss this.

For now, let me add that you can extract the metadata from a notebook with

import nbformat
nbformat.read('notebook.ipynb', as_version=4)['metadata']

Also, I do not think that editing a paired notebook in another format will mess up the notebook. I just tried it with a ipynb/py paired notebook, edited as a md file in vim. I agree that the jupytext.text_representation metadata in the notebook changes. But that is only a temporal change, that disappears if you reopen the notebook in Jupyter (you will have first to delete the obsolete py file), save, and reopen another time. In the next release I think I will remove this text_representation from the ipynb file (and keep it only in the text file).

@goerz
Copy link
Author

goerz commented Jan 2, 2019

That's good to hear. I'll let you figure out what you want to do with the metadata in the next release before looking into this further.

@mwouts mwouts added this to the v1.0.0 milestone Jan 14, 2019
@mwouts
Copy link
Owner

mwouts commented Jan 14, 2019

Hello @goerz , I've made some progresses towards the next release of Jupytext (in branch 1.0.0).

Regarding this issue, I was thinking of adding two more arguments to jupytext command line:

  • sync to update all the representations of the notebook, taking the input cells from the most recent file (and outputs from the ipynb file)
  • paired_paths to list the alternative paths for a notebook (not including the current path; one path per line).

Do you think that should be enough for jupytext.vim? What other kind of metadata would you need to access?

@goerz
Copy link
Author

goerz commented Jan 15, 2019

So this is for a situation where there are multiple linked files for the same notebook? Like a markdown and a Python file at the same time? Sounds good to me! Besides the pathnames in paired_paths, jupytext.vim would have to know the format (the --to parameter), which can't always be deduced from the file extension. So maybe <format>:<path> per line in the output?

Do I understand correctly that jupytext --paired_path notebook.ipynb would print out the paths for all the linked files for that notebook and exit?

@mwouts
Copy link
Owner

mwouts commented Jan 15, 2019

Do I understand correctly that jupytext --paired_path notebook.ipynb would print out the paths for all the linked files for that notebook and exit?

Exactly (print nothing and exit without error when there is no linked file)

Besides the pathnames in paired_paths, jupytext.vim would have to know the format (the --to parameter), which can't always be deduced from the file extension. So maybe : per line in the output?

Well, with the sync option I think that would not be necessary. My expectations are:

  • VIM would first execute jupytext --paired_path notebook.ipynb.
  • In case this is a paired notebook, it would run jupytext --sync notebook.ipynb,
  • Then open the first paired document
  • When saving it would execute jupytext --sync notebook.ipynb again.

If the notebook is not paired to any text representation, VIM would convert the notebook to a text file, in the format given by the user in the config file (that is the only case where I expect that VIM provides the format information to jupytext).

On a separate subject: I plan to allow linked files in different folders. Do you have an input on how this could be encoded in the format information? I have a proposal there, your feedback is welcome!

@goerz
Copy link
Author

goerz commented Jan 15, 2019

I think that would work fine. There is one place where the vim plugin as currently implemented still relies knowing the exact format:
https://github.com/goerz/jupytext.vim/blob/f99e82ab93d3bfe1eef3baf131a0228ba0a94911/plugin/jupytext.vim#L124

This maps jupytext formats to vim "filetypes", which regulates syntax highlighting (and potentially other custom settings like ftplugins). So if you don't expose the format through --paired_path, then the filetype would only be set through the filename extension. I have a very hard time imagining why this wouldn't be ok, that is, why someone would want to use different vim filetypes for py:percent compared to py:light. So I think it's ok either way.

Just out of curiosity, where will --paired_path get its information from? Will this still be metadata stored in the notebook json? Just to give some background on my own use cases, based on my usage of the vim plugin for the past two weeks: It appears I regularly work with two types of notebooks: One is for presentations/tutorials/documentation (like https://github.com/qucontrol/krotov/blob/master/docs/notebooks/02_example_lambda_system_rwa_complex_pulse.ipynb). These notebooks tend to be have more markdown in them, and I'll generally want to edit them in vim with the markdown format. The other type of notebooks are "computational" (like https://gist.github.com/goerz/f75a5091152cc8db917f5a0726a3fff6). These are basically scripts for keeping track of daily work. They're heavy on python code and only have very minimal markdown, so I'd like to edit them as python scripts in vim. Right now, I just manually set g:jupytext_fmt when switching between those two types. It would be nicer if the information about which kind of notebook a particular ipynb file is was embedded in the notebook metadata. So the way jupytext currently stores metadata seemed ok for that (except the vim plugin doesn't have access to the metadata).

@mwouts
Copy link
Owner

mwouts commented Jan 16, 2019

Hello @goerz, I have implemented the new arguments in jupytext CLI, as well as the automatic detection of the format when notebook is read from stdin (#148). Would you like to test the latest version in branch 1.0.0 ?

Thanks for your comments! I think your two use cases also correspond to mine. Don't you think we could compute automatically the best format for a notebook (e.g. markdown when markdown content is more than 10% of the total, script otherwise) ?

Also, the paired paths are computed using the notebook metadata jupytext.formats (plus the notebook path), at paired_paths.py.

@goerz
Copy link
Author

goerz commented Jan 17, 2019

I'm currently getting ready to travel to a conference, but I will test the new version in branch 1.0.0. when I get back at the end of next week.

Automatic detection of whether a notebook is predominantly code or has significant markdown content is certainly an interesting idea!

@mwouts
Copy link
Owner

mwouts commented Jan 17, 2019

Hello @goerz , sure! There's no hurry here. In that case, I even suggest that we test directly the RC when available - I will let you know when it's published on pypi.

@mwouts
Copy link
Owner

mwouts commented Jan 29, 2019

Hello @goerz , the RC is available, and has many new features. Could you please give it a try? Thanks!

pip install jupytext --pre --upgrade

@goerz
Copy link
Author

goerz commented Jan 30, 2019

I like the new features a lot! It adds a lot of flexibility.

I ran into a few issues, though (#163, #164, #165). I'll try to do some more testing in the next few days.

@mwouts
Copy link
Owner

mwouts commented Jan 30, 2019

Thanks for testing @goerz . And for covering your findings with such details! I agree with your reports - none of this is not expected to happen. I will add more tests and fix that...

@mwouts
Copy link
Owner

mwouts commented Feb 8, 2019

Tests and fix are available in the latest RC jupytext==1.0.0-rc3. I'll close this issue tracker, we can continue discuss on the other ones. Thanks @goerz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants