Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use .Rmd as main format, take output from .ipynb. #12

Closed
grst opened this issue Jul 10, 2018 · 17 comments
Closed

Always use .Rmd as main format, take output from .ipynb. #12

grst opened this issue Jul 10, 2018 · 17 comments
Assignees

Comments

@grst
Copy link
Contributor

grst commented Jul 10, 2018

I really like the idea of having a synchronized copy of .Rmd and .ipynb. That's basically the jupyter-way of having a nb.html file and gives us the best of two worlds: a text-editable, version-controllable Rmd file and an easily-sharable, output-preserving ipynb file.

I would suggest to pull the output out of the corresponding .ipynb file, even when opening .Rmd. That solves the issue that one could accidentically overwrite changes made to the .Rmd file in an external editor when opening the corresponding .ipynb file because one would like to retrieve contents.

I suggest the following procedure:

  • when opening either .ipynb or .Rmd, check if both .Rmd and .ipynb are available
  • check cell-by-cell if the 'code' parts are identical
  • if yes, retrieve output from ipynb
  • if no, discard all outputs and load code from Rmd.
@mwouts
Copy link
Owner

mwouts commented Jul 10, 2018

Hello @grst, thanks for your suggestion, indeed reopening the .Rmd file with outputs preserved (when possible) makes the user experience much smoother.

I've posted a first implementation for this in version 0.2.5. It does the following: match cell inputs from .Rmd and .ipynb (with a further constraint that cell order should match), and restore output for the cells where input match.

For now the implementation is not symetric. The matching will occur only when you reopen the .Rmd file, not the .ipynb one - as I prefer not to change the behavior on the default notebook type. But of course we can further (test and) discuss this.

@grst
Copy link
Contributor Author

grst commented Jul 11, 2018

For now the implementation is not symetric. The matching will occur only when you reopen the .Rmd file, not the .ipynb one - as I prefer not to change the behavior on the default notebook type. But of course we can further (test and) discuss this.

Maybe that's a behaviour we could set using a config flag.

@mwouts
Copy link
Owner

mwouts commented Jul 11, 2018

I've encountered an issue with this functionality on a notebook with plotly plots: the Rmd notebook can't be trusted (i.e trusting the notebook reloads the notebook, which is again in a position of not being trusted...).

@mwouts
Copy link
Owner

mwouts commented Jul 12, 2018

Hello @grst , I'll try to fix the above, and also provide a symmetric implementation for this feature.

What do you think of the following?

  • We change the pre_save_hook to save to the formats in metadata "nbrmd_formats" if it exists, and else to "ContentsManager.default_nbrmd_formats", which is per default [".ipynb"], and can be changed with, for instance,
c.ContentsManager.default_nbrmd_formats = [".ipynb", ".Rmd"]
  • When a notebook misses the "nbrmd_formats" metadata, that entry is created as the union of default_nbrmd_formats with the current extension. When a notebook is exported to another format, current and target extensions are added to the "nbrmd_formats" metadata.
  • Then, when a notebook is opened,
    • inputs are taken from the most recent notebook, among all extensions listed in "nbrmd_formats" (and, if timestamp match, they are taken from the current notebook)
    • outputs are taken from ".ipynb" if that extension is available in "nbrmd_formats"

@grst
Copy link
Contributor Author

grst commented Jul 12, 2018

I like the idea with the nbrmd_formats metadata and also with the default_nbrmd_formats flag.

I don't like so much the idea of using timestamps for resolving which contents to use, for the following reasons:

  • it's too much 'magic' going on, and I don't know which contents I'm actually viewing
  • what happens if two files have been changed independently? A merge conflict so-to-say.

I would rather define a 'primary' format (which would be the one I put under version control), and regard all other files as dispensible copies.

@mwouts
Copy link
Owner

mwouts commented Jul 12, 2018

Thanks @grst , actually I agree, that's probably too magic. So we will always take input from the primary format. But... how would you identify which is the primary format?

In the current implementation, I made the implicit assumption that primary format was that of the file being opened.

Would you prefer that

  • primary format is that of file being opened (i.e. no magic loading for inputs)
  • primary format is the first element of 'nbrmd_formats' that is not '.ipynb'
  • primary format is specified in another metadata?

@grst
Copy link
Contributor Author

grst commented Jul 12, 2018

As "explicit is better than implicit" I opt for another metadata/config option.

@mwouts
Copy link
Owner

mwouts commented Jul 13, 2018

Hello @grst , I've published a new version on pypi, that seems to fit my needs. Please confirm whether it also work for you. Thanks for the suggestion, it makes the package way more easy to use!

@mwouts
Copy link
Owner

mwouts commented Jul 17, 2018

Seems to work well, I'll close this.

By the way - this happens to be a workaround for #8 Indeed, I confirm that opening the .ipynb notebook will reload inputs from the .Rmd file (given that the companion .Rmd file is identified either in jupyter config, or in the notebook metadata).

@mwouts mwouts closed this as completed Jul 17, 2018
@grst
Copy link
Contributor Author

grst commented Jul 18, 2018

This works very well indeed.
I suggest some fine tuning concerning the default option handling (see #16, #17).

And very nice to have a workaround for #8!

@abalter
Copy link

abalter commented Oct 16, 2018

@grst

  • it's too much 'magic' going on, and I don't know which contents I'm actually viewing
  • what happens if two files have been changed independently? A merge conflict so-to-say.

I've been working having to move back and forth between RStudio and Jupyter. This means that my .Rmd files are being edited in both editors. I'm pretty sure there may be others who end up doing this as Jupyter and RStudio have a not-completely-overlapping feature set. Also, until RStudio decides to communicate with the rest of the data science ecosystem (they are like the Apple of data science), it may be that Jupyter has to be the one to facilitate the synchronicity and compatibility.

Therefore, I'm not sure having to do merges would be that bad. When you open a file (.Rmd or .ipynb) that it out of sync with its siblings (for example a .R file as well), you could be offered the usual choices of:

Would you like to:

  • Overwrite sibling with current file
  • Overwrite current file with sibling
  • Merge changes

Basically if you want ipynb to always be the master, then you simply take a microsecond to click the appropriate choice.

This would be my preferred solution.

@mwouts
Copy link
Owner

mwouts commented Oct 16, 2018

Hello @abalter , thanks for entering the discussion. This was actually a discussion on the implementation of a very preliminary version of jupytext! Now the behavior has converged to what is documented in the README:

When loading or refreshing an .ipynb file, the input cells of the notebook are read from the first non-.ipynb file among the associated formats.
When loading or refreshing a non-.ipynb file, the outputs are read from the .ipynb file (if ipynb is listed in the formats).
I.e. inputs cells are always taken from current document if not ipynb, or from the first text representation otherwise.

Say you use ipynb, Rmd and R, in this order. Then, if you know that the R version is the one that is up to date, you should explicitely open that one in Jupyter.

Jupyter does check that the text representation is more recent than the ipynb - in case you have modified the ipynb without using Jupytext - that's #63 . No checks are done on text representations other than the main one.

We could think of extending the check (PRs are welcome), and refuse to load inputs from a text representation that is not the most recent file among the set of representations, asking the user to validate the file contents first. Would that suit your needs? By the way, do you think many people use more than one text representation?

@abalter
Copy link

abalter commented Oct 17, 2018

I think this is somewhat uncharted territory. The notebook formats are good for use in IDEs and the script format for batch execution. Moreover, notebook formats are good for development and debugging. There are a whole range of different development patterns to explore. I'm going to put together a more detailed description of what we have been considering as I think it might be useful for further development.

Overall, jupytext is amazing, and something I've been really waiting for to come along!

@mwouts
Copy link
Owner

mwouts commented Oct 17, 2018

Thanks @abalter. Sure, let us know how you plan to use jupytext, that will be very interesting to me and to other users.

By the way, I saw your polyglottus project, is this something you are still working on? Have you tried using %%R cells in a python notebook? Or the opposite? In principle the Rmd to Jupyter notebook conversion maps the local cell language to R markdown... let me know if you want to test that!

@stefanuddenberg
Copy link

I'm having an issue with using jupytext 1.0.3 where my .ipynb notebooks become un-trusted while they exist alongside .Rmd and .py representations. My configuration is as follows:

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
# Always pair ipynb notebooks to md files
c.ContentsManager.default_jupytext_formats = "ipynb,Rmd,py"
# Keep all metadata in md file
c.ContentsManager.default_notebook_metadata_filter = "all"
c.ContentsManager.default_cell_metadata_filter = "all"
# Use the percent format when saving as py
c.ContentsManager.preferred_jupytext_formats_save = "py:percent"

@mwouts
Copy link
Owner

mwouts commented Mar 15, 2019

@stefanuddenberg , thanks for reporting this. Typically un-trusted notebooks are caused by a non-identical round trip conversion. Would you like to open an issue for this ? We will also need a sample notebook that shows the issue.

@stefanuddenberg
Copy link

Sure; I'll open an issue for it later today with an example. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants