Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert notebook to Static HTML -- Markdown cells with html image references not viewable #328

Closed
soamaven opened this issue Jun 27, 2016 · 58 comments
Assignees

Comments

@soamaven
Copy link

soamaven commented Jun 27, 2016

I am going a little crazy trying to figure out how to convert a relatively simple notebook to html or pdf with the images included. I'm catching flak from my prof for using Jupyter instead of Powerpoint because my images didn't show up on another computer. 😳

nbconvert --to html my_notebook.ipynb works great on my machine, but the file doesn't transfer to others nicely. I can't find any straightforward guide as to why this is.

I have markdown cells with the images in question and they are referenced as follows:

<img src="./Images/My_image.png" width="800" height="800" alt="Alt_name" title="Mytitle" align="center" />

and they don't show up when converting to html or pdf once the file leaves the original dir. When using markdown syntax, images show up, but I have much less control over the format of the images, which is why I am using HTML. Any ideas? My bet is that this is a solved issue, just there is not easily find-able documentation of it.

Edit:
Using
nbconvert==4.2
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0

@takluyver
Copy link
Member

If you reference external files, you'll need to copy those files around with the notebook or HTML export. Images should get embedded into PDF, but I'm not sure if the conversion via pandoc handles HTML image tags like the ones you're using.

It would be possible to read in the images when converting to HTML and embed them as data urls, but we don't currently have any code to do this.

@soamaven
Copy link
Author

soamaven commented Jun 28, 2016

Hmm. I guess this was an unexpected answer, but thank you for the response!

Seeing as Jupyter is being advertised to the data/scientific community, and students like myself will want to use it to share presentations, notebooks, data etc both statically and dynamically (acheivable via servers, etc). I include a lot of .png in my notebooks for communicating previous work, in addition to the figures that I whip-up via something like matplotlib.

I guess I don't know what would be a good way forward on this. To me it seems like a self-contained file would be ideal for sending via email. To others, I would imagine sending a zip with figures is just as easy, and still others, serving the notebook would be the solution.

All seem valid. I guess I'd advocate for some sort of self-contained, offline solution. This would be solved via .pdf, if pandoc could handle the HTML tags. Or Jupyter's markdown syntax was a bit more versatile with image manipulation.

When you say

It would be possible to read in the images when converting to HTML and embed them as data urls, but we don't currently have any code to do this.

How easy would that be for a amateur to contribute to? Is this a pre/post processor job? I've looked into the templates functionality offered by nbconvert, but it feels a little overwhelming, as I know as much HTML as i've shown 😥

@takluyver
Copy link
Member

I agree that it would be good to have an option to export to HTML with embedded images.

The code itself should be simple enough to write, with a bit of learning about the HTML parser interface and data urls. See the citations filter for an example that parses HTML in Markdown cells to replace bits.

The trickier thing is working out exactly where to plug it into the nbconvert API. It could be done as any of a preprocessor, custom exporter, or postprocessor. I'll keep thinking about that...

@soamaven
Copy link
Author

soamaven commented Jun 29, 2016

So i took citations.py and modified it so that it will take some source string and replace a file name in an image tag to the base64 encoding. I figure the gist here a good first step. I'd like to extend the img tag parser to do something similar to what @randy3k did here to control image size so that everyone can have size control when converting to .tex and subsequently .pdf if this eventually gets pulled into nbconvert.

The gist outputs the correct base64 string (checked it by copy and paste into this online converter), but when I was checking it in a Jupyter Markdown cell with the HTML tag, it can't show the image, nor any other base64 image. Can anyone confirm that Jupyter can display data URIs encoded in base64? I can't get it to.

EDIT: Corrected links

@takluyver
Copy link
Member

I think we strip some possible HTML when rendering markdown cells. Maybe data urls are something that get stripped. @minrk, is that plausible? If so, it shouldn't be a problem when nbconverting to HTML.

I think the gist links you gave are the URL for embedding the gist - here's the link for the nice view of it: https://gist.github.com/soamaven/4de1727f76790b574342bd6231402843

@minrk
Copy link
Member

minrk commented Jun 30, 2016

data URIs are allowed in markdown cells, but I think they were not for some early versions after we added sanitization. What version of the notebook @soamaven (Help > About in the notebook)?

@soamaven
Copy link
Author

soamaven commented Jun 30, 2016

@takluyver Thanks for letting me know. I updated the links.

Notebook version ==4.2.1

The data uri don't show up as images either in notebook view or after converting to static html. I created a repo in case anyone wanted to see what was going on. My money is that I am doing something wrong.... 😐 but the base64 encoding seems correct when manually copying it to a decoder as mentioned before.

@soamaven
Copy link
Author

soamaven commented Jun 30, 2016

I won the bet that I was doing something wrong.
Apparently using base64.b64encode() gives the correct output while base64.urlsafe_b64encode does not. I would not have guess this based on what the Python docs describe, but perhaps I would have known if I knew more about HTML. Other odd thing is that the gist has this correct.

So the Markdown Cells won't display the base64 in the notebook when evaluating the MD cell, but the images are properly displayed when using jupyter nbconvert --to html mynotebook.ipynb and also within the notebook when using the IPython function IPython.display(Markdown()). This behavior should be reflected in the repo

EDIT Accidentally closed the issue, reopened

@soamaven soamaven reopened this Jun 30, 2016
@takluyver
Copy link
Member

Aha! Yes, I think data URLs do not need the url-safe version of base64, confusingly.

Thanks for taking the time to put it into a repo.

@Carreau, you're good at helping people make extensions. Do you think this would make sense as a preprocessor? Should we develop some hooks for plugging in external preprocessors like we can have external exporters?

@Carreau
Copy link
Member

Carreau commented Jul 1, 2016

Probably. @michaelpacer is starting halftime today, that might be one of the first tasks we can try to investigate.

@takluyver
Copy link
Member

Welcome @michaelpacer :-)

@mpacer
Copy link
Member

mpacer commented Jul 5, 2016

@Carreau @takluyver Is it more conventional to make notes about the source repo as comments here(in this issue on nbconvert) or as issues on @soamaven's repo? Or is it contextually dependent?

e.g., I just ran into the point that in the .ipynb file it was expecting a py35 rather than Python 2 or Python 3 as I have it set up on my system. This is just a naming issue I know, and my instinct was to ask that question as an issue on the repo, not in comments here, but I wanted to check in about what the common policy is (given that I sense that the nbconvert_data_uri repo is intended to be a mechanism for developing a test case for whether this feature works within the nbconvert repo rather than a repo intended to be distributed and developed independently).

@mpacer
Copy link
Member

mpacer commented Jul 17, 2016

So, I just tried running this locally and discovered that I cannot reproduce the original bug, see:

image

@soamaven Was this something that was pushed to the notebook since this issue was raised? I don't trust my ability to make the extension and know that I've covered your use case if I cannot reproduce the original error you were trying to address.

@soamaven
Copy link
Author

soamaven commented Jul 18, 2016

@michaelpacer this is odd. Is this behavior from the .html produced by executing jupyter nbconvert --to html img2base64.ipynb or the .ipynb? If .ipynb, the image it shouldn't show up as mentioned before due to sanitization. This behavior is consistent on my machine shown below. (Here I changed the image alignment to the left so that it would not be hidden by the horizontal scroll if it appears, however, it doesn't appear as expected. Also, I have the auto-sectioning extension enabled, explaining the numbering discrepancy)
screenshot from 2016-07-18 15-01-42

However, after jupyter nbconvert --to html img2base64.ipynb the data uri shows properly as below (aligned-left)
screenshot from 2016-07-18 15-05-02

I think we could still put together an extension for embedding images into converted notebooks. I am not sure why I cannot view the fourth markdown cell's image but you can however.

I am using:
Chrome Version 51.0.2704.106 (64-bit) on Fedora 23 to view the notebook.
jupyter-client==4.3.0
jupyter-core==4.1.0
notebook==4.2.1

EDIT:I misread the comment referenced. Apparently data uri's should be showing in my notebook upon executing the MD cell, but are not, even though I am using a newer version of notebook. I have tried on both Firefox v47.0 and Chrome with the same results on my machine.

@mpacer
Copy link
Member

mpacer commented Jul 28, 2016

That was the behaviour of the notebook itself (which I thought was your concern).

So i think i was working on
jupyter-core==4.1.0
notebook==5.0.0dev

And I think that jupyter-client is what you get when you run jupyter kernelspec --version…I can't figure out where to get that otherwise, but if that is the case, mine is also 4.3.0.

Does it work on your system if you upgrade to the dev version of the notebook (you have to build it from github, instructions for a dev build can be found here: https://github.com/jupyter/notebook/blob/master/CONTRIBUTING.rst)

@mpacer
Copy link
Member

mpacer commented Jul 28, 2016

And sorry about the unclear comment, it was poorly worded.

I meant, "I'm running on the most recent version of the notebook (i.e., the one that hasn't been released). Has something been pushed to the 5.0.0dev version that happens to fix this problem?"

@soamaven
Copy link
Author

soamaven commented Jul 29, 2016

I installed notebook from source, but the version is 4.2.1, not 5.0.0dev... what repo are you pulling from?
Edit: If you are refering to ipython, that is now version 5.0.0 for me

@soamaven
Copy link
Author

Also, while I appreciate your help @michaelpacer to resolve this issue, it is separate from the one that I opened this #328 for, which was help to create an extension for embedding images .html converted notebooks. Should we open a separate issue for the markdown cells not showing dataurl over at notebook?

@juhasch
Copy link
Contributor

juhasch commented Jul 29, 2016

You also can take a look at the nbconvert postprocessor for embedding images in HTML over at ipython-contrib:
https://github.com/ipython-contrib/jupyter_contrib_nbextensions/blob/master/src/jupyter_contrib_nbextensions/nbconvert_support/post_embedhtml.py

It only recognizes <img> tags in markdown for now.

@soamaven
Copy link
Author

soamaven commented Jul 29, 2016

@michaelpacer I have some more information. Apparently i was silently prepending a call to my python2.7.12 environment in my path, and so it looked there there every time I used 'jupyter notebook'. I have now fixed this.

I can see the dataurl images in my python3.5 environment now 😀
I cannot see them still when running my python2.7.12 environment, however. Is this expected behavior?
EDIT: Clarity

@soamaven
Copy link
Author

@juhasch Thanks for the link! It looks like what I was trying to write, but better put together. This was a bit too hard to find though, I would vote it gets included into nbconvert as a cmd line template? Or referenced in either the documents of nbconvert or the extensions docs/wiki? It seems that post processors are pretty powerful, it would be nice to have some more information about what is available.

Also, I have reread the nbconvert docs trying to figure out exactly how to use such a post processor... sorry, I am admittedly an amateur, but its pretty cryptic.

Do we think we can add support/postprocessor for image re-sizeing when converting to latex/pdf, ala "what @randy3k did here to control image size ?" I'll look into this...

@mpacer
Copy link
Member

mpacer commented Aug 3, 2016

Yes i think this is a different issue I'll make one now, though it may belong in the notebook not the nbconvert repo.

@mpacer
Copy link
Member

mpacer commented Aug 3, 2016

Ok — I've managed to reproduce that error in python 2 in the notebook v 4.2.1. It displays in both python 2 and python 3 in notebook v 5.0.0dev. I won't create a new issue on the notebook repo, but that's where I would have done it.

now that the display thing is dealt with onto the next steps :)

@mpacer
Copy link
Member

mpacer commented Aug 4, 2016

@soamaven Sidenote: If you build from the master branch of jupyter notebook it should have version 5.0.0dev.

@mpacer
Copy link
Member

mpacer commented Aug 4, 2016

@minrk @takluyver @Carreau Is this something that would be included in the 5.0 or since it's an extension would it be better to add afterward?

@soamaven
Copy link
Author

soamaven commented Nov 4, 2016

@juhasch your solution is amazing, this is exactly what I had wanted to do and was not able to figure out entrypoints. Thank you!

Any idea how one can use this in conjunction with --to slides for a stand alone reveal.js presentation? I'll close this issue if so.

@teoguso
Copy link

teoguso commented Nov 27, 2016

Hi everyone,

this issue is a bit long and confusing. I think I have a problem related to this, but I'm not 100% sure, so bear with me (and please tell me if I should just open a new issue).
I believe the issue here is not (or not only) about image embedding, but more about HTML image tag parsing within markdown cells. The summary above by @Carreau gets close to it, I think.

Let me explain what my problem is: I am preparing slides using a notebook and want to include various images. To be able to control size, I use HTML syntax directly, i.e. <img src=URL>.
As i try to convert the notebook with
jupyter-nbconvert --to slides slides.ipynb --reveal-prefix=reveal.js
I obtain a functioning HTML document, but some images do not show up and the HTML code is shown instead. Inspecting the HTML file, it's easy to spot the problematic piece of code, as what in the notebook was
<img src="./graphics/githubSF.png" alt="test" width=600>
becomes
&lt;img src="./graphics/githubSF.png" alt="test" width=600&gt;.
To give a bit more context with images, the notebook cell
text_raw_cell
is rendered as follows in the HTML file:
html_out_cell

Some additional info:

  • this happens for both URLs and local files
  • there is one instance where the HTML tag is actually correctly rendered to an image, who knows why.
  • if you want to try and reproduce it, the repository is at https://github.com/teoguso/sol_1116
  • jupyter installation (it's python 3, tried with python 2 but nothing changes):
jupyter==1.0.0
jupyter-client==4.3.0
jupyter-console==4.1.1
jupyter-contrib-core==0.3.0
jupyter-core==4.1.0
jupyter-nbextensions-configurator==0.2.2

@teoguso
Copy link

teoguso commented Nov 27, 2016

A quick follow-up: my problem is ascribable to mistune: downgrading to mistune version 0.7.2 the problem disappears.

EDIT: Looks like it's related to this issue on mistune.

@gcbgit
Copy link

gcbgit commented Feb 10, 2017

I was having the same problem of my HTML image tags showing up as text in my export when selecting "save as HTML with toc" from file, any every other HTML export method (including nbconvert --to html).

Downgrading mistune fixed it for me to.

@teoguso
Copy link

teoguso commented Feb 10, 2017

Hello everyone,

I confirm that the problem is caused by a change in how mistune parses html code and attibutes, as specified here. The workaround for version 0.7.3 of mistune is to put quotes around all html attributes, e.g. <img src="http://this.com/that.jpg" width="300">.
This is now fixed on the mistune master.

@takluyver
Copy link
Member

Thanks @teoguso

@danzimmerman
Copy link

With mistune 0.7.4 and nbconvert 5.1.1, I am still having a similar problem with tag parsing using nbconvert to HTML.

After some testing I've found it's failing with spaces around the equals sign in HTML attributes:

image

image

I mentioned this on lepture/mistune#81 as well.

@damianavila
Copy link
Member

Given that we are 1 year after the issue was originally raised and having tested the that nbconverted htmls with referenced images works in master (unless you hit upstream issue described by danzimmerman) , @mpacer what do you thing about closing this one?

@damianavila
Copy link
Member

OK, I will close this one. Feel free to re-open if you disagree.

@leblancfg
Copy link

Issue is closed, so I'm sure it's a PEBKAC, but I am having the exact issues OP is describing running:

  • Jupyter version: 4.2.1
  • nbconvert version: 5.2.1
  • Python: 3.6.1
  • Distribution: Anaconda
  • Platform: Windows

I wonder if someone could shed some light on what steps to undertake to fix this in the discussion on StackOverflow? Many thanks.

@leblancfg
Copy link

Solution found on SO -- needed doublequotes around all HTML attributes, even width. Thanks @mpacer, you're a goddamn hero!

mpacer added a commit to mpacer/nbconvert that referenced this issue Jul 25, 2017
takluyver added a commit that referenced this issue Jul 25, 2017
Add recent mistune release to avoid other people hitting #328
@gabyx
Copy link

gabyx commented Jul 28, 2017

How can we make a button/or add e menu entry Download -> "HTML (Embedded)"
I made a button extension (similar to the hide_all extension) which does something like that
but does not work because the html_embed is not recognized:

var load_ipython_extension = function() {
        Jupyter.toolbar.add_buttons_group([{
            id : 'export_embedded',
            label : 'Embedded HTML Export',
            icon : '+',
            callback : function() {
                Jupyter.menubar._nbconvert('html_embed', true);
            }
        }]);
        if (Jupyter.notebook !== undefined && Jupyter.notebook._fully_loaded) {
            // notebook_loaded.Notebook event has already happened
            initialize();
        }
        events.on('notebook_loaded.Notebook', initialize);
    };

[W 19:29:41.330 NotebookApp] 404 GET /nbconvert/html_embed/EmbedImages/Test.ipynb?download=true (::1): No exporter for format: html_embed

@gabyx
Copy link

gabyx commented Jul 28, 2017

I think that thing should be done somehow in
notebook/notebook/nbconvert/handlers.py

@mpacer
Copy link
Member

mpacer commented Jul 28, 2017

@gabyx I think your comments should probably be an issue in notebook rather than nbconvert given that the code you're running into is there (not in nbconvert). My guess is what will need to change is the js object not the py endpoints(if the issue has to do with Jupyter.menubar._nbconvert having an improper target)

@gabyx
Copy link

gabyx commented Jul 29, 2017

Thanks I will open an issue =)

@gabyx
Copy link

gabyx commented Jul 29, 2017

jupyter/notebook#2706

@YohanObadia
Copy link

Just identified a hack that might be of help.
If you generate the image through python code like bellow it works:

import matplotlib.pyplot as plt
img = plt.imread('<your_image_path>')
plt.imshow(img)

@washiloo
Copy link

Just identified a hack that might be of help.
If you generate the image through python code like bellow it works:

import matplotlib.pyplot as plt
img = plt.imread('<your_image_path>')
plt.imshow(img)

This is excellent! Just add plt.axis('off') to hide the axes and that should do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests