html dedent for markdown export #996

amniskin · 2019-04-24T03:00:45Z

nbconvert doesn't dedent 'text/html' outputs, which leads to errors down the line (specifically when converting notebooks that use plotly to Markdown).

To exemplify this issue, convert any notebook that uses plotly to markdown and you'll see the plotly js code (that came from a text/html output cell output -- but was indented) is displayed as verbatim rather than executed.
Given that HTML is dedent invariant (that was awkward to write) it seems like the reader should dedent by default?

I started digging into the code, but wasn't sure where would be the best place to add this. Is this something you all would be into? I can write a pull request, I just need to know where would be a sane place for it?

MSeal · 2019-04-24T06:28:34Z

Would you mind posting images and/or an example notebook with the command used to make sure problem is clear for everyone?

The code which performs the translation is here and the template it uses is here. The pandoc library is used for the actual template implementation, which flows through here. How these modules interact is unfortunately not super simple, but this is a starting point to explore. The template appears to naively print the text/html without any other processing, so likely an improvement to use a filter or function that improves this behavior would be welcome.

amniskin · 2019-05-12T22:59:23Z

As for a minimal example:

And the associated markdown output:

It seems like this is a problem with the text/html cell parser itself though, no? By that I mean, since HTML is white-space invariant, any leading whitespace can be removed without changing the interpretation of the HTML block. That way if later some other format is sensitive to whitespace, we won't have to remember this there too. It would also make it easier to export a relatively human readable HTML file (with proper indentation).

P.S. Sorry it took so long to respond, I broke the python install on my personal computer and hadn't spent the time to fix it until now.

MSeal · 2019-05-12T23:57:48Z

Seems like a reasonable request. Thanks for gathering the info and adding images. I haven't read through the code-paths involved in a while but your reasoning on where to implement sounds right.

amniskin · 2019-05-13T01:47:55Z

I noticed you tagged this with "enhancement" but it's really a bug. The Markdown generated isn't equivalent to the HTML generated, which it should be. The preserved indentation causes some HTML to be displayed as verbatim text rather than HTML to be processed and added to the DOM.

This is particularly annoying if you use plotly because plotly inserts indented code into the HTML cells, and then your plotly javascript initialization call gets inserted as verbatim and never gets called. So that none of your plots show up properly.

MSeal added enhancement help wanted labels May 12, 2019

amniskin mentioned this issue May 13, 2019

dedenting html in ExtractOutputPreprocessor #1023

Merged

MSeal added bug and removed enhancement labels May 14, 2019

MSeal closed this as completed May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html dedent for markdown export #996

html dedent for markdown export #996

amniskin commented Apr 24, 2019 •

edited

Loading

MSeal commented Apr 24, 2019 •

edited

Loading

amniskin commented May 12, 2019

MSeal commented May 12, 2019 •

edited

Loading

amniskin commented May 13, 2019

html dedent for markdown export #996

html dedent for markdown export #996

Comments

amniskin commented Apr 24, 2019 • edited Loading

MSeal commented Apr 24, 2019 • edited Loading

amniskin commented May 12, 2019

MSeal commented May 12, 2019 • edited Loading

amniskin commented May 13, 2019

amniskin commented Apr 24, 2019 •

edited

Loading

MSeal commented Apr 24, 2019 •

edited

Loading

MSeal commented May 12, 2019 •

edited

Loading