-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed pdf export path and naming failures #986
Conversation
nbconvert/exporters/pdf.py
Outdated
rc = self.run_bib(tex_file) | ||
if rc: | ||
# This can fail for non-conversion blocking reasons | ||
self.run_bib(tex_file) | ||
rc = self.run_latex(tex_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second latex should only be run if the bibtex command is successful. Otherwise, the conversion will take almost twice as long as necessary. Maybe nest an if
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will change that.
nbconvert/exporters/pdf.py
Outdated
|
||
if not rc: | ||
raise RuntimeError('Failed to run "{}" commands'.format( | ||
[cmd.format(filename=tex_file) for cmd in self.latex_command])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does emit a critical log that goes to stderr by default. I didn't put the full output in the error message because it can be 100's or 1000's or lines long when I was testing. I'll look at truncating the output and putting a piece of it here so there's some visibility if log messages aren't being streamed to the client for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't see the fact that you moved the latex failure into run_command
. I like that. It also leaves a nice way to give better bibtex errors in the future.
Only test with an issue still is the |
Fixed the relative pathing (and spaces in paths) issues for latex/pdf extraction, I did a lot of testing locally to try different combinations of pathing issues and tried to update or add tests to capture those cases most concisely.
|
@@ -178,7 +178,7 @@ | |||
"metadata": {}, | |||
"source": [ | |||
"Make sure markdown parser doesn't crash with empty Latex formulas blocks\n", | |||
"$$$$\n", | |||
"$$ $$\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the file changes were just rerunning the notebook -- this was the only real change
@@ -20,6 +20,8 @@ This template does not define a docclass, the inheriting class must define this. | |||
% Basic figure setup, for now with no caption control since it's done | |||
% automatically by Pandoc (which extracts ![](path) syntax from Markdown). | |||
\usepackage{graphicx} | |||
% protect against spaces in asset names | |||
\usepackage[space]{grffile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only sort of works, but it keeps things from hard crashing and just introduces rendering idiosyncrasies for when any code paths might leave a space in a rendered asset path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's redundant with the addition of grffile below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grffile only protects against spaces in names when using pdflatex, but not XeLaTeX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the example I provided at https://github.com/mpacer/tex_ex/ grffile
protects against spaces in names when using XeLaTeX. But, it requires adding the fix from https://github.com/ho-tex/oberdiek/issues/31#issuecomment-441094438 to get it to work.
@@ -35,7 +35,7 @@ | |||
|
|||
% Display markdown | |||
((* block data_markdown -*)) | |||
((( output.data['text/markdown'] | citation2latex | strip_files_prefix | convert_pandoc('markdown+tex_math_double_backslash', 'latex')))) | |||
((( output.data['text/markdown'] | citation2latex | strip_files_prefix | convert_pandoc('markdown+tex_math_double_backslash', 'latex', extra_args=[], relative_path_replacement=resources.working_directory, build_path_replacement=resources.output_files_dir)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values are set in _fake_output_files_dir
in exporters/pdf.py
. This was the easiest way for the wrapping process to inform the conversion that we're operating in a different directory than the file, and to provide a latex-safe path for conversions to take place within.
@@ -444,7 +444,7 @@ def test_linked_images(self): | |||
""" | |||
Generate PDFs with an image linked in a markdown cell | |||
""" | |||
with self.create_temp_cwd(['latex-linked-image.ipynb', 'testimage.png']): | |||
with self.create_temp_cwd(['latex-linked-image.ipynb', 'test image.png']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes the test cover more edges by having a space in the asset.
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/request-for-help-for-a-new-nbconvert-release/811/1 |
nbconvert/utils/pandoc.py
Outdated
# However the will still render but will double render some path | ||
# characters alongside the image. To avoid this also use build_path_replacement. | ||
|
||
# => (!\[(?:[iI]mage|[aA]lt [tT]ext|)\]\()((?![^:\.]+:\/\/)[^\/][^)\"]*\.[^)\"]*[^ ])( )?(\"[^)\"]+\")?(\)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a link to regex 101 here
@onlyif_cmds_exist('pandoc') | ||
def test_markdown2latex_path_reference(self): | ||
"""markdown2latex replace_relative_path against referenced path test""" | ||
s = 'Some ![image](reference)\n[reference]: ref.png' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo try with 2 newlines and see if it gets fixed
self.run_latex(tex_file) | ||
# This can fail for non-conversion blocking reasons | ||
if self.run_bib(tex_file): | ||
self.run_latex(tex_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically — if we're running bibTeX, we should be running LaTeX one more time. That is it should look like
xelatex file.tex
bibtex file.tex
xelatex file.tex
xelatex file.tex
For those who want to know why — https://tex.stackexchange.com/questions/53235/why-does-latex-bibtex-need-three-passes-to-clear-up-all-warnings explains it. But basically because LaTeX was designed in a way that it kept little in memory, it needs to separately create references (LaTeX run 1) , create a file to house all the bibliography references (bibTeX), add the reference file contents to the page (LaTeX run 2), resolve all references (LaTeX run 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the run_latex
command actually runs latex 3 times by default, so it should be fine.
…to-latex into separate function.
… better image tag matching
return convert_pandoc(source, markup, 'latex', extra_args=extra_args) | ||
return convert_pandoc(source, markup, 'latex', | ||
extra_args=extra_args, | ||
relative_path_replacement=relative_path_replacement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add these keyword arguments to convert_pandoc
. I think it would be cleaner to do it in either:
- a jinja filter that is run prior to calling
convert_pandoc
- a preprocessor that changes these links before jinja templating is applied
going into the details of this would be too long for this little comment, but I'll add more details about the implications of both later.
The same point should be made for build_path_replacement=build_path_replacement
below.
@@ -33,7 +33,8 @@ def markdown2html_mistune(source): | |||
] | |||
|
|||
|
|||
def markdown2latex(source, markup='markdown', extra_args=None): | |||
def markdown2latex(source, markup='markdown', extra_args=None, | |||
relative_path_replacement=None, build_path_replacement=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wouldn't add these keyword arguments here.
@@ -1,7 +1,8 @@ | |||
from nbconvert.utils.pandoc import pandoc | |||
|
|||
|
|||
def convert_pandoc(source, from_format, to_format, extra_args=None): | |||
def convert_pandoc(source, from_format, to_format, extra_args=None, | |||
relative_path_replacement=None, build_path_replacement=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment about these flags.
@@ -18,9 +19,21 @@ def convert_pandoc(source, from_format, to_format, extra_args=None): | |||
to_format : string | |||
Pandoc format for output. | |||
|
|||
extra_args : list (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for adding this!
A high level point — this is doing a lot of cool things, but how much of the issues are fixed by adding the following to the preamble?
I got that from https://github.com/ho-tex/oberdiek/issues/31#issuecomment-441094438 which is a patch while maintainers of latex core push this fix out more broadly. If you want to see a working example of this in a directory structure you can clone https://github.com/mpacer/tex_ex/ and run |
@@ -161,7 +163,7 @@ This template does not define a docclass, the inheriting class must define this. | |||
|
|||
((* block commands *)) | |||
% Prevent overflowing lines due to hard-to-break entities | |||
\sloppy | |||
\sloppy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would see if you can turn off your auto-formatter for .tplx and .tpl files. There are situations in which having spaces can be semantically meaningful in LaTeX.
So @MSeal and I were working through this today. My major suggestion is to keep the pandoc APIs unchanged. Instead, switch the current code into its own jinja filter (in the Fortunately, the model you're already using here nbconvert/nbconvert/utils/pandoc.py Lines 79 to 81 in 7cf79f8
convert_pandoc call in the template here
What might be even cleaner would be to have two separate filters rather than one: so then the code would look something like
If you wanted to make a change that you couldn't make from manipulating the markdown itself but that you needed to solve before it was written out into the LaTeX, you can do that by manipulating the a JSON representation of the pandoc AST that pandoc knows how to produce and consume. This is what However, given that you haven't needed to do this for your fix, we can probably just get away with creating two new filters. If you'd like, I could push these changes on top of your PR. Or we could merge this and then I could make a new one with these changes in place — your call. |
with TemporaryWorkingDirectory(): | ||
if not resources: | ||
resources = {} | ||
with self._fake_output_files_dir(resources): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the path locations should occur at the level of the latex exporter. This set-up removes the ability to use the latex exporter + manually compiling with latex as a debugging tool, since the intermediate latex will be different.
@@ -53,8 +55,8 @@ This template does not define a docclass, the inheriting class must define this. | |||
\usepackage[mathletters]{ucs} % Extended unicode (utf-8) support | |||
\usepackage[utf8x]{inputenc} % Allow utf-8 characters in the tex document | |||
\usepackage{fancyvrb} % verbatim replacement that allows latex | |||
\usepackage{grffile} % extends the file name processing of package graphics | |||
% to support a larger range | |||
\usepackage{grffile} % extends the file name processing of package graphics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being added above in addition to existing here.
@@ -6,23 +6,25 @@ | |||
import subprocess | |||
import os | |||
import sys | |||
import shutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this import is being used.
@MSeal Going to bump this to 6.0 |
@t-makaro I was going to take a stab at updating this PR for the 6.0 milestone, if you had thoughts about it since we last were working on this. |
@MSeal I'm not sure that this is needed anymore. My understanding of this PR was that it created a temp directory for output images that were extracted by the extractoutputpreprocessor. The extractoutputpreprocessor was sometimes creating file names that LaTeX could not handle. Due to improvements in LaTeX's packages, and my fix in #1193 (which was a fix to my earlier fix), I think that this should work fine. In fact, the LaTeX package fixes should also improve file path problems for files referenced in markdown. |
Yeah I think you're right! I tested out a few of the failing patterns and we're at a working place for the ones I was targeting with this PR. I'll close this out :) |
The fix involves isolating all file paths to a temp directory with safe characters. I tried doing something less intrusive but the underlying latext command can't handle a variety of cases and I had to wrap just about all pathing for the exporter in a temp_dir wrapper to get all cases covered.
Fixes #965 and #633
Might address #974 given it sounded link complication by path characters.
Also I unintentionally fixed #947 in digging into the original problem I believe.