Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 problem during converting to PDF #786

Open
theasder opened this issue Mar 21, 2018 · 8 comments
Open

UTF-8 problem during converting to PDF #786

theasder opened this issue Mar 21, 2018 · 8 comments
Labels
format:LaTeX pertains to exporting to the LaTeX format

Comments

@theasder
Copy link

theasder commented Mar 21, 2018

Hey there,

I have jupyter notebook from anaconda on Ubuntu 16.04 with installed xetex. I tried to convert it to PDF and it fetched successfully all english words and formulas, but there were some utf-8 symbols and it ignored it. Some logs here:

[W 18:37:28.242 NotebookApp] Notebook ЮраМолодец.ipynb is not trusted
[I 18:37:32.967 NotebookApp] Starting buffering for 005c099d-5a37-410e-812a-9b062b5744fc:e6a21be8811347f892517ca798014ee1
[I 18:37:33.290 NotebookApp] Kernel restarted: 005c099d-5a37-410e-812a-9b062b5744fc
[I 18:37:35.346 NotebookApp] Adapting to protocol v5.1 for kernel 005c099d-5a37-410e-812a-9b062b5744fc
[I 18:37:35.346 NotebookApp] Restoring connection for 005c099d-5a37-410e-812a-9b062b5744fc:e6a21be8811347f892517ca798014ee1
[I 18:37:35.347 NotebookApp] Replaying 6 buffered messages
[W 18:37:42.269 NotebookApp] Notebook ЮраМолодец.ipynb is not trusted
[I 18:37:43.054 NotebookApp] Support files will be in 
[I 18:37:43.054 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.055 NotebookApp] Making directory /root
[I 18:37:43.056 NotebookApp] Writing 30590 bytes to /root/notebook.tex
[I 18:37:43.056 NotebookApp] Building PDF
[I 18:37:43.056 NotebookApp] Running xelatex 3 times: ['xelatex', '/root/notebook.tex']
[I 18:37:47.278 NotebookApp] Running bibtex 1 time: ['bibtex', '/root/notebook']
[W 18:37:47.302 NotebookApp] bibtex had problems, most likely because there were no citations
[I 18:37:47.303 NotebookApp] PDF successfully created

It generated valid tex file, but no multilanguage support in it.

@hycakir
Copy link

hycakir commented Mar 30, 2018

See my answer here: https://stackoverflow.com/a/49582428/2372611

The problem is Jupyter uses xelatex command to compile latex (to support Unicode, I think). But the problem is there is no need for xelatex for the generated file, it can be directly compiled with latex or pdflatex with Unicode support. I think the file generated does not have the configurations needed for xelatex to evaluate Unicode characters.

@QGB
Copy link

QGB commented Aug 28, 2018

image

jupyter unicode convert pdf

@t-makaro
Copy link
Contributor

Can someone please produce a minimum example notebook.ipynb and/or provide a copy of the latex output from:

Jupyter nbconvert --to latex notebook.ipynb

If I have a file to work with, then I can investigate this.

@t-makaro
Copy link
Contributor

t-makaro commented Aug 28, 2018

I believe this is relevant. The April 2018 release of LaTeX defaults to utf-8 encoding.

Also relevant.

If I can get a file and replicate the issue, I may be able to solve this.

@frederik-elwert
Copy link

I invensigated the problem a bit, and the main issue seems not to stem from the fact that the UTF8 is not correctly recognized. The actual problem is that the main font does not have the corresponding glyphs.

Jupyter uses the mathpazo package to load URW Palladio. But that font does not cover many scripts. Using DejaVu Sans instead, which covers a wide range of unicode scripts, fixed the problem for me (still not covering cases like RTL languages, but that’s another problem).

The problem is that DejaVu Sans is not exactly pretty, and this would affect all documents, even those who don’t use non-latin scripts.

A possible solution seems to be the ucharclasses package. That allows to define separate fonts for different unicode blocks. That way, the main (latin) font could be left as it is, only specifying fallback fonts for other scripts.

The Noto fonts might be a viable set of fonts for non-latin blocks.

@t-makaro
Copy link
Contributor

t-makaro commented Mar 26, 2019

I just spent some time exploring ucharclass, and I believe that I can make this work.

If I added the following:

\usepackage[Latin,Greek]{ucharclasses}
\usepackage{fontspec}   
        
\newfontfamily{\mynormal}{Latin Modern Roman}
\setDefaultTransitions{\mynormal}{}
\newfontfamily{\mygreek}{Courier New} 
\setTransitionsForGreek{\mygreek}{}

to the bottom of the preamble (It messes with section titles if I put \usepackage{fontspec} any earlier), then symbols like θα work properly. I see no reason why this wouldn't work for other Unicode blocks. We just need to agree on fonts for the different blocks. I would also like to figure out how to store the default font instead of overriding it.

This will also only work in XeLaTeX, so it would be smart to wrap this is some

\ifdefined\XeLaTeXonlycommand
...
\fi

This way it is still possible to compile the latex file using pdflatex.

CC @mpacer

@t-makaro
Copy link
Contributor

t-makaro commented Mar 26, 2019

I just noticed an issue with this solution. \setDefaultTransitions{\mynormal}{} will change to a non-monospaced font for any latin characters including inside cell inputs/outputs. This could be changed to \setDefaultTransitions{\ifcell\somemonofont\else\mynormal\fi}{}, but then every single verbatim environment needs to be wrapped with \celltrue … \cellfalse where cell is defined by \newif\ifcell.

@t-makaro t-makaro added the format:LaTeX pertains to exporting to the LaTeX format label Aug 2, 2019
@jpgoldberg
Copy link

jpgoldberg commented Jul 22, 2022

When using xelatex (or lualatex) the preamble correctly loads the unicode-math package. But that is the only font setting it has. The OpenType fonts loaded by unicode-math are good for the math part, but are limited in other respects. In particular Latin Modern Mono does not include Greek or Cyrillic.

If we replace \usepackage{unicode-math} with \usepackage[default]{fontsetup} we get all the goodness of unicode-math (because fontsetup loads unicode-math), but we get the full cm-unicode fonts for all of the text (including monospaced) which includes Greek and Cyrillic.

See my StackExchange answer for more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format:LaTeX pertains to exporting to the LaTeX format
Projects
None yet
Development

No branches or pull requests

6 participants