UTF-8 problem during converting to PDF #786

theasder · 2018-03-21T19:25:07Z

Hey there,

I have jupyter notebook from anaconda on Ubuntu 16.04 with installed xetex. I tried to convert it to PDF and it fetched successfully all english words and formulas, but there were some utf-8 symbols and it ignored it. Some logs here:

[W 18:37:28.242 NotebookApp] Notebook ЮраМолодец.ipynb is not trusted
[I 18:37:32.967 NotebookApp] Starting buffering for 005c099d-5a37-410e-812a-9b062b5744fc:e6a21be8811347f892517ca798014ee1
[I 18:37:33.290 NotebookApp] Kernel restarted: 005c099d-5a37-410e-812a-9b062b5744fc
[I 18:37:35.346 NotebookApp] Adapting to protocol v5.1 for kernel 005c099d-5a37-410e-812a-9b062b5744fc
[I 18:37:35.346 NotebookApp] Restoring connection for 005c099d-5a37-410e-812a-9b062b5744fc:e6a21be8811347f892517ca798014ee1
[I 18:37:35.347 NotebookApp] Replaying 6 buffered messages
[W 18:37:42.269 NotebookApp] Notebook ЮраМолодец.ipynb is not trusted
[I 18:37:43.054 NotebookApp] Support files will be in 
[I 18:37:43.054 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.056 NotebookApp] Writing 30590 bytes to /root/notebook.tex
[I 18:37:43.056 NotebookApp] Building PDF
[I 18:37:43.056 NotebookApp] Running xelatex 3 times: ['xelatex', '/root/notebook.tex']
[I 18:37:47.278 NotebookApp] Running bibtex 1 time: ['bibtex', '/root/notebook']
[W 18:37:47.302 NotebookApp] bibtex had problems, most likely because there were no citations
[I 18:37:47.303 NotebookApp] PDF successfully created

It generated valid tex file, but no multilanguage support in it.

The text was updated successfully, but these errors were encountered:

hycakir · 2018-03-30T23:16:02Z

See my answer here: https://stackoverflow.com/a/49582428/2372611

The problem is Jupyter uses xelatex command to compile latex (to support Unicode, I think). But the problem is there is no need for xelatex for the generated file, it can be directly compiled with latex or pdflatex with Unicode support. I think the file generated does not have the configurations needed for xelatex to evaluate Unicode characters.

QGB · 2018-08-28T08:43:46Z

jupyter unicode convert pdf

t-makaro · 2018-08-28T16:58:33Z

Can someone please produce a minimum example notebook.ipynb and/or provide a copy of the latex output from:

Jupyter nbconvert --to latex notebook.ipynb

If I have a file to work with, then I can investigate this.

t-makaro · 2018-08-28T17:18:22Z

I believe this is relevant. The April 2018 release of LaTeX defaults to utf-8 encoding.

Also relevant.

If I can get a file and replicate the issue, I may be able to solve this.

frederik-elwert · 2018-10-17T08:35:47Z

I invensigated the problem a bit, and the main issue seems not to stem from the fact that the UTF8 is not correctly recognized. The actual problem is that the main font does not have the corresponding glyphs.

Jupyter uses the mathpazo package to load URW Palladio. But that font does not cover many scripts. Using DejaVu Sans instead, which covers a wide range of unicode scripts, fixed the problem for me (still not covering cases like RTL languages, but that’s another problem).

The problem is that DejaVu Sans is not exactly pretty, and this would affect all documents, even those who don’t use non-latin scripts.

A possible solution seems to be the ucharclasses package. That allows to define separate fonts for different unicode blocks. That way, the main (latin) font could be left as it is, only specifying fallback fonts for other scripts.

The Noto fonts might be a viable set of fonts for non-latin blocks.

t-makaro · 2019-03-26T20:30:19Z

I just spent some time exploring ucharclass, and I believe that I can make this work.

If I added the following:

\usepackage[Latin,Greek]{ucharclasses}
\usepackage{fontspec}   
        
\newfontfamily{\mynormal}{Latin Modern Roman}
\setDefaultTransitions{\mynormal}{}
\newfontfamily{\mygreek}{Courier New} 
\setTransitionsForGreek{\mygreek}{}

to the bottom of the preamble (It messes with section titles if I put \usepackage{fontspec} any earlier), then symbols like θα work properly. I see no reason why this wouldn't work for other Unicode blocks. We just need to agree on fonts for the different blocks. I would also like to figure out how to store the default font instead of overriding it.

This will also only work in XeLaTeX, so it would be smart to wrap this is some

\ifdefined\XeLaTeXonlycommand
...
\fi

This way it is still possible to compile the latex file using pdflatex.

CC @mpacer

t-makaro · 2019-03-26T22:08:50Z

I just noticed an issue with this solution. \setDefaultTransitions{\mynormal}{} will change to a non-monospaced font for any latin characters including inside cell inputs/outputs. This could be changed to \setDefaultTransitions{\ifcell\somemonofont\else\mynormal\fi}{}, but then every single verbatim environment needs to be wrapped with \celltrue … \cellfalse where cell is defined by \newif\ifcell.

jpgoldberg · 2022-07-22T16:48:18Z

When using xelatex (or lualatex) the preamble correctly loads the unicode-math package. But that is the only font setting it has. The OpenType fonts loaded by unicode-math are good for the math part, but are limited in other respects. In particular Latin Modern Mono does not include Greek or Cyrillic.

If we replace \usepackage{unicode-math} with \usepackage[default]{fontsetup} we get all the goodness of unicode-math (because fontsetup loads unicode-math), but we get the full cm-unicode fonts for all of the text (including monospaced) which includes Greek and Cyrillic.

See my StackExchange answer for more detail.

imtsuki mentioned this issue May 30, 2019

Fix LaTeX exporting '?' for non-ascii title #1039

Merged

t-makaro added the format:LaTeX pertains to exporting to the LaTeX format label Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 problem during converting to PDF #786

UTF-8 problem during converting to PDF #786

theasder commented Mar 21, 2018 •

edited

Loading

hycakir commented Mar 30, 2018

QGB commented Aug 28, 2018

t-makaro commented Aug 28, 2018

t-makaro commented Aug 28, 2018 •

edited

Loading

frederik-elwert commented Oct 17, 2018

t-makaro commented Mar 26, 2019 •

edited

Loading

t-makaro commented Mar 26, 2019 •

edited

Loading

jpgoldberg commented Jul 22, 2022 •

edited

Loading

UTF-8 problem during converting to PDF #786

UTF-8 problem during converting to PDF #786

Comments

theasder commented Mar 21, 2018 • edited Loading

hycakir commented Mar 30, 2018

QGB commented Aug 28, 2018

t-makaro commented Aug 28, 2018

t-makaro commented Aug 28, 2018 • edited Loading

frederik-elwert commented Oct 17, 2018

t-makaro commented Mar 26, 2019 • edited Loading

t-makaro commented Mar 26, 2019 • edited Loading

jpgoldberg commented Jul 22, 2022 • edited Loading

theasder commented Mar 21, 2018 •

edited

Loading

t-makaro commented Aug 28, 2018 •

edited

Loading

t-makaro commented Mar 26, 2019 •

edited

Loading

t-makaro commented Mar 26, 2019 •

edited

Loading

jpgoldberg commented Jul 22, 2022 •

edited

Loading