Always use .Rmd as main format, take output from .ipynb. #12

grst · 2018-07-10T10:39:45Z

I really like the idea of having a synchronized copy of .Rmd and .ipynb. That's basically the jupyter-way of having a nb.html file and gives us the best of two worlds: a text-editable, version-controllable Rmd file and an easily-sharable, output-preserving ipynb file.

I would suggest to pull the output out of the corresponding .ipynb file, even when opening .Rmd. That solves the issue that one could accidentically overwrite changes made to the .Rmd file in an external editor when opening the corresponding .ipynb file because one would like to retrieve contents.

I suggest the following procedure:

when opening either .ipynb or .Rmd, check if both .Rmd and .ipynb are available
check cell-by-cell if the 'code' parts are identical
if yes, retrieve output from ipynb
if no, discard all outputs and load code from Rmd.

The text was updated successfully, but these errors were encountered:

mwouts · 2018-07-10T23:21:21Z

Hello @grst, thanks for your suggestion, indeed reopening the .Rmd file with outputs preserved (when possible) makes the user experience much smoother.

I've posted a first implementation for this in version 0.2.5. It does the following: match cell inputs from .Rmd and .ipynb (with a further constraint that cell order should match), and restore output for the cells where input match.

For now the implementation is not symetric. The matching will occur only when you reopen the .Rmd file, not the .ipynb one - as I prefer not to change the behavior on the default notebook type. But of course we can further (test and) discuss this.

grst · 2018-07-11T07:05:10Z

For now the implementation is not symetric. The matching will occur only when you reopen the .Rmd file, not the .ipynb one - as I prefer not to change the behavior on the default notebook type. But of course we can further (test and) discuss this.

Maybe that's a behaviour we could set using a config flag.

mwouts · 2018-07-11T14:02:56Z

I've encountered an issue with this functionality on a notebook with plotly plots: the Rmd notebook can't be trusted (i.e trusting the notebook reloads the notebook, which is again in a position of not being trusted...).

mwouts · 2018-07-12T08:14:33Z

Hello @grst , I'll try to fix the above, and also provide a symmetric implementation for this feature.

What do you think of the following?

We change the pre_save_hook to save to the formats in metadata "nbrmd_formats" if it exists, and else to "ContentsManager.default_nbrmd_formats", which is per default [".ipynb"], and can be changed with, for instance,

c.ContentsManager.default_nbrmd_formats = [".ipynb", ".Rmd"]

When a notebook misses the "nbrmd_formats" metadata, that entry is created as the union of default_nbrmd_formats with the current extension. When a notebook is exported to another format, current and target extensions are added to the "nbrmd_formats" metadata.
Then, when a notebook is opened,
- inputs are taken from the most recent notebook, among all extensions listed in "nbrmd_formats" (and, if timestamp match, they are taken from the current notebook)
- outputs are taken from ".ipynb" if that extension is available in "nbrmd_formats"

grst · 2018-07-12T08:32:30Z

I like the idea with the nbrmd_formats metadata and also with the default_nbrmd_formats flag.

I don't like so much the idea of using timestamps for resolving which contents to use, for the following reasons:

it's too much 'magic' going on, and I don't know which contents I'm actually viewing
what happens if two files have been changed independently? A merge conflict so-to-say.

I would rather define a 'primary' format (which would be the one I put under version control), and regard all other files as dispensible copies.

mwouts · 2018-07-12T15:02:32Z

Thanks @grst , actually I agree, that's probably too magic. So we will always take input from the primary format. But... how would you identify which is the primary format?

In the current implementation, I made the implicit assumption that primary format was that of the file being opened.

Would you prefer that

primary format is that of file being opened (i.e. no magic loading for inputs)
primary format is the first element of 'nbrmd_formats' that is not '.ipynb'
primary format is specified in another metadata?

grst · 2018-07-12T18:20:29Z

As "explicit is better than implicit" I opt for another metadata/config option.

mwouts · 2018-07-13T13:52:24Z

Hello @grst , I've published a new version on pypi, that seems to fit my needs. Please confirm whether it also work for you. Thanks for the suggestion, it makes the package way more easy to use!

mwouts · 2018-07-17T22:42:43Z

Seems to work well, I'll close this.

By the way - this happens to be a workaround for #8 Indeed, I confirm that opening the .ipynb notebook will reload inputs from the .Rmd file (given that the companion .Rmd file is identified either in jupyter config, or in the notebook metadata).

grst · 2018-07-18T17:53:11Z

This works very well indeed.
I suggest some fine tuning concerning the default option handling (see #16, #17).

And very nice to have a workaround for #8!

abalter · 2018-10-16T16:42:19Z

@grst

it's too much 'magic' going on, and I don't know which contents I'm actually viewing

what happens if two files have been changed independently? A merge conflict so-to-say.

I've been working having to move back and forth between RStudio and Jupyter. This means that my .Rmd files are being edited in both editors. I'm pretty sure there may be others who end up doing this as Jupyter and RStudio have a not-completely-overlapping feature set. Also, until RStudio decides to communicate with the rest of the data science ecosystem (they are like the Apple of data science), it may be that Jupyter has to be the one to facilitate the synchronicity and compatibility.

Therefore, I'm not sure having to do merges would be that bad. When you open a file (.Rmd or .ipynb) that it out of sync with its siblings (for example a .R file as well), you could be offered the usual choices of:

Would you like to:

Overwrite sibling with current file
Overwrite current file with sibling
Merge changes

Basically if you want ipynb to always be the master, then you simply take a microsecond to click the appropriate choice.

This would be my preferred solution.

mwouts · 2018-10-16T20:28:52Z

Hello @abalter , thanks for entering the discussion. This was actually a discussion on the implementation of a very preliminary version of jupytext! Now the behavior has converged to what is documented in the README:

When loading or refreshing an .ipynb file, the input cells of the notebook are read from the first non-.ipynb file among the associated formats.
When loading or refreshing a non-.ipynb file, the outputs are read from the .ipynb file (if ipynb is listed in the formats).
I.e. inputs cells are always taken from current document if not ipynb, or from the first text representation otherwise.

Say you use ipynb, Rmd and R, in this order. Then, if you know that the R version is the one that is up to date, you should explicitely open that one in Jupyter.

Jupyter does check that the text representation is more recent than the ipynb - in case you have modified the ipynb without using Jupytext - that's #63 . No checks are done on text representations other than the main one.

We could think of extending the check (PRs are welcome), and refuse to load inputs from a text representation that is not the most recent file among the set of representations, asking the user to validate the file contents first. Would that suit your needs? By the way, do you think many people use more than one text representation?

abalter · 2018-10-17T00:17:07Z

I think this is somewhat uncharted territory. The notebook formats are good for use in IDEs and the script format for batch execution. Moreover, notebook formats are good for development and debugging. There are a whole range of different development patterns to explore. I'm going to put together a more detailed description of what we have been considering as I think it might be useful for further development.

Overall, jupytext is amazing, and something I've been really waiting for to come along!

mwouts · 2018-10-17T06:25:05Z

Thanks @abalter. Sure, let us know how you plan to use jupytext, that will be very interesting to me and to other users.

By the way, I saw your polyglottus project, is this something you are still working on? Have you tried using %%R cells in a python notebook? Or the opposite? In principle the Rmd to Jupyter notebook conversion maps the local cell language to R markdown... let me know if you want to test that!

stefanuddenberg · 2019-03-15T06:09:41Z

I'm having an issue with using jupytext 1.0.3 where my .ipynb notebooks become un-trusted while they exist alongside .Rmd and .py representations. My configuration is as follows:

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
# Always pair ipynb notebooks to md files
c.ContentsManager.default_jupytext_formats = "ipynb,Rmd,py"
# Keep all metadata in md file
c.ContentsManager.default_notebook_metadata_filter = "all"
c.ContentsManager.default_cell_metadata_filter = "all"
# Use the percent format when saving as py
c.ContentsManager.preferred_jupytext_formats_save = "py:percent"

mwouts · 2019-03-15T08:59:21Z

@stefanuddenberg , thanks for reporting this. Typically un-trusted notebooks are caused by a non-identical round trip conversion. Would you like to open an issue for this ? We will also need a sample notebook that shows the issue.

stefanuddenberg · 2019-03-15T15:11:28Z

Sure; I'll open an issue for it later today with an example. Thanks!

mwouts self-assigned this Jul 10, 2018

mwouts added a commit that referenced this issue Jul 10, 2018

Outputs taken from ipynb file when they match input #12

391863e

mwouts added a commit that referenced this issue Jul 12, 2018

Introducing default_nbrmd_formats #12

5e99a99

mwouts added a commit that referenced this issue Jul 12, 2018

Trust .Rmd notebook if .ipynb is trusted #12

d2ebd96

mwouts added a commit that referenced this issue Jul 13, 2018

Load cell inputs from nbrmd_sourceonly_format extension #12

3efaf67

mwouts added a commit that referenced this issue Jul 13, 2018

New version 0.2.6 #12

7c99cbb

mwouts added a commit that referenced this issue Jul 16, 2018

Introducing default_nbrmd_formats #12

8c1908a

mwouts added a commit that referenced this issue Jul 16, 2018

Trust .Rmd notebook if .ipynb is trusted #12

a10c45e

mwouts added a commit that referenced this issue Jul 16, 2018

Load cell inputs from nbrmd_sourceonly_format extension #12

dd2eb6b

mwouts added a commit that referenced this issue Jul 16, 2018

New version 0.2.6 #12

eb354a7

mwouts closed this as completed Jul 17, 2018

mwouts mentioned this issue Mar 19, 2019

Notebooks become 'not trusted' when code cells are modified in the paired file #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always use .Rmd as main format, take output from .ipynb. #12

Always use .Rmd as main format, take output from .ipynb. #12

grst commented Jul 10, 2018

mwouts commented Jul 10, 2018

grst commented Jul 11, 2018

mwouts commented Jul 11, 2018

mwouts commented Jul 12, 2018

grst commented Jul 12, 2018 •

edited

Loading

mwouts commented Jul 12, 2018

grst commented Jul 12, 2018

mwouts commented Jul 13, 2018

mwouts commented Jul 17, 2018

grst commented Jul 18, 2018 •

edited

Loading

abalter commented Oct 16, 2018

mwouts commented Oct 16, 2018

abalter commented Oct 17, 2018

mwouts commented Oct 17, 2018

stefanuddenberg commented Mar 15, 2019

mwouts commented Mar 15, 2019

stefanuddenberg commented Mar 15, 2019

Always use .Rmd as main format, take output from .ipynb. #12

Always use .Rmd as main format, take output from .ipynb. #12

Comments

grst commented Jul 10, 2018

mwouts commented Jul 10, 2018

grst commented Jul 11, 2018

mwouts commented Jul 11, 2018

mwouts commented Jul 12, 2018

grst commented Jul 12, 2018 • edited Loading

mwouts commented Jul 12, 2018

grst commented Jul 12, 2018

mwouts commented Jul 13, 2018

mwouts commented Jul 17, 2018

grst commented Jul 18, 2018 • edited Loading

abalter commented Oct 16, 2018

mwouts commented Oct 16, 2018

abalter commented Oct 17, 2018

mwouts commented Oct 17, 2018

stefanuddenberg commented Mar 15, 2019

mwouts commented Mar 15, 2019

stefanuddenberg commented Mar 15, 2019

grst commented Jul 12, 2018 •

edited

Loading

grst commented Jul 18, 2018 •

edited

Loading