Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html dedent for markdown export #996

Closed
amniskin opened this issue Apr 24, 2019 · 4 comments
Closed

html dedent for markdown export #996

amniskin opened this issue Apr 24, 2019 · 4 comments

Comments

@amniskin
Copy link
Contributor

amniskin commented Apr 24, 2019

nbconvert doesn't dedent 'text/html' outputs, which leads to errors down the line (specifically when converting notebooks that use plotly to Markdown).

To exemplify this issue, convert any notebook that uses plotly to markdown and you'll see the plotly js code (that came from a text/html output cell output -- but was indented) is displayed as verbatim rather than executed.
Given that HTML is dedent invariant (that was awkward to write) it seems like the reader should dedent by default?

I started digging into the code, but wasn't sure where would be the best place to add this. Is this something you all would be into? I can write a pull request, I just need to know where would be a sane place for it?

@MSeal
Copy link
Contributor

MSeal commented Apr 24, 2019

Would you mind posting images and/or an example notebook with the command used to make sure problem is clear for everyone?

The code which performs the translation is here and the template it uses is here. The pandoc library is used for the actual template implementation, which flows through here. How these modules interact is unfortunately not super simple, but this is a starting point to explore. The template appears to naively print the text/html without any other processing, so likely an improvement to use a filter or function that improves this behavior would be welcome.

@amniskin
Copy link
Contributor Author

As for a minimal example:

2019-05-12-154235_958x1031_scrot
And the associated markdown output:
2019-05-12-154408_958x529_scrot

It seems like this is a problem with the text/html cell parser itself though, no? By that I mean, since HTML is white-space invariant, any leading whitespace can be removed without changing the interpretation of the HTML block. That way if later some other format is sensitive to whitespace, we won't have to remember this there too. It would also make it easier to export a relatively human readable HTML file (with proper indentation).

P.S. Sorry it took so long to respond, I broke the python install on my personal computer and hadn't spent the time to fix it until now.

@MSeal
Copy link
Contributor

MSeal commented May 12, 2019

Seems like a reasonable request. Thanks for gathering the info and adding images. I haven't read through the code-paths involved in a while but your reasoning on where to implement sounds right.

@amniskin
Copy link
Contributor Author

I noticed you tagged this with "enhancement" but it's really a bug. The Markdown generated isn't equivalent to the HTML generated, which it should be. The preserved indentation causes some HTML to be displayed as verbatim text rather than HTML to be processed and added to the DOM.

This is particularly annoying if you use plotly because plotly inserts indented code into the HTML cells, and then your plotly javascript initialization call gets inserted as verbatim and never gets called. So that none of your plots show up properly.

@MSeal MSeal added bug and removed enhancement labels May 14, 2019
@MSeal MSeal closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants