Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: build of numpy-user.pdf reports Missing Chinese characters with font FreeSerif #22930

Closed
jfbu opened this issue Jan 4, 2023 · 7 comments · Fixed by #23172
Closed

DOC: build of numpy-user.pdf reports Missing Chinese characters with font FreeSerif #22930

jfbu opened this issue Jan 4, 2023 · 7 comments · Fixed by #23172

Comments

@jfbu
Copy link
Contributor

jfbu commented Jan 4, 2023

Issue with current documentation:

I wanted to test the PDF build of documentation for 1.24, and the build ended with this report:

Latexmk: ====List of undefined refs and citations:
  Missing character: There is no 琴 (U+7434) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 春 (U+6625) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 鈴 (U+9234) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 猫 (U+732B) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 傅 (U+5085) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 立 (U+7ACB) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
  Missing character: There is no 业 (U+4E1A) in font [FreeSerif.otf]/OT:script=latn;language=dflt;mapping=tex-text;!
 And 2 more --- see log file.

Then I checked log file where the page numbers are indicated (339 and 363 for my build but my virtual environment was lacking some stuff so warnings were reported about doxygen and scipy, perhaps I am missing pages in my build) and this actually maps to the Release notes for 1.23.3 and 1.21.4.

Indeed here is a screenshot for the 1.21.4 Release pages in the PDF for 1.23 release

Capture d’écran 2023-01-05 à 00 00 04

By the way according to Google translate 傅立业 indicated missing in this screenshot (page 363 in my build of 1.24.1 documentation) means Fourier...

Idea or request for content:

Unfortunately I personally am very little familiar with xelatex but I think it should be possible to configure it to switch to another font for CJK ideograms, I can not help currently though, but surely others will.

By the way, with recent LaTeX (since 2021) one can (via \tracinglostchars3 in the preamble) let such missing characters cause a LaTeX build error... if one really really wishes this.

Completely unrelated note: I wanted to test how the PDF looked with Sphinx development tip (future 6.1.0). It does look fine (code-blocks with rounded corners and background gray, tables with rows of alternating grayish colors). I had to install sphinx_design and then reinstall Sphinx as sphinx_design pins it to <6.

@charris
Copy link
Member

charris commented Jan 5, 2023

TBF, we have given up trying to support the PDF documentation. It is buggy, and we don't have the resources to deal with it. If you or someone else wants to work on it, go for it.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 5, 2023

@charris I am interested into any problem you might have with PDF documentation, to the extent they relate to core Sphinx.

Yesterday the build of numpy-ref.pdf and numpy-user.pdf for 1.24.1 (at a28f4f2) looked fine at my locale with Sphinx 6 (actually its current dev branch, but 6.0.0 would have behaved the same). I noticed a few reported multiply-defined labels and missing references (which may have to do with a partial install on my side) for numpy-ref.pdf. There are perhaps some table related issues (I see some tabulary complaints in latex log) which I have to check. And there is the Chinese glyphs issue arising in numpy-user.pdf as reported above.

Unfortunately I personally am very little familiar with xelatex but I think it should be possible to configure it to switch to another font for CJK ideograms, I can not help currently though, but surely others will.

I will have a quick look at it, despite (as I somewhat theatrically said) lacking familiarity with xelatex + polyglossia + CJK.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 5, 2023

Documenting for future references the (few) other problems I see at this time. Working with numpy commit a28f4f2 tagged v1.24.1. But my environment may not have all the needed components for a complete doc build. As mentioned above I am using Sphinx current master tip.

  • at end of numpy.ndarray.view() PDF documentation there is a strange one-row table with the word dot. One can see it also at https://numpy.org/doc/1.23/numpy-ref.pdf on page 35. Checking the source file numpy/core/_add_newdocs.py I saw nothing special matching this near line 4707. Here is a screenshot from https://numpy.org/doc/1.23/numpy-ref.pdf on page 35
    Capture d’écran 2023-01-05 à 10 32 00
    With Sphinx 6 it looks like this:
    Capture d’écran 2023-01-05 à 10 36 05
    (the two PDFs not at same scale when I took the screenshots)
  • same phenomenon at end of doc of numpy.matrix.view() on page 206 of https://numpy.org/doc/1.23/numpy-ref.pdf. Actually this is the exact same docstring as for numpy.ndarray.view().

I did a build of the HTML docs at my locale (which has perhaps a lacunary environment; also sphinx_design pinned Sphinx to <6 and I forcefully reinstalled Sphinx 6). I then checked the HTML for this docstring to make sure the dot problem in PDF was not some artefact from my build environment. Indeed HTML build at my locale looks fine.

It is strange that the produced TeX file contains indeed the mark-up for such an extra one-row table
It occurs right before an \end{fulllineitems} whose matching \begin is (in the case of the numpy.ndarray.view()) the one immediately at start of numpy.ndarray section. (a \subsubsection).

Could it have to do with some dot command from graphviz? (very wild guess).

EDIT: I do see a

add_newdoc('numpy.core.multiarray', 'ndarray', ('dot'))

line in _add_newdocs.py. Wonder if this has to do with this, but well.

Only other problems I see:

Multiply defined references reported during build of numpy-ref.pdf:

LaTeX Warning: Label `reference/maskedarray.generic:maskedarray-generic' multip
ly defined.


LaTeX Warning: Label `reference/routines.fft:routines-fft' multiply defined.


LaTeX Warning: Label `reference/routines.linalg:routines-linalg' multiply defin
ed.


LaTeX Warning: Label `reference/random/index:numpyrandom' multiply defined.


LaTeX Warning: Label `reference/typing:typing' multiply defined.

Undefined references for numpy-ref.pdf:

LaTeX Warning: Hyper reference `reference/c-api/array:sec-array-iterator' on pa
ge 2010 undefined on input line 192975.

EDIT: after installing breathe and scipy 1.10.0 in the build environment (details skipped, I am on an old system), the make latex build reported build succeeded with no warnings nor errors. The above got modified into

LaTeX Warning: Hyper reference `reference/c-api/array:sec-array-iterator' on pa
ge 1886 undefined on input line 191075.

for some reason (don't know why the page count decreased...). Also there are now

Package hyperref Warning: Difference (2) between bookmark levels is greater 
(hyperref)                than one, level fixed on input line 89870.

type warnings.


Build of numpy-user.pdf has no reported multiply-defined labels or missing ones.

I do not see any other obvious problem to be identified from the LaTeX logs.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 5, 2023

diff --git a/doc/source/conf.py b/doc/source/conf.py
index 9546db5f2..42a943661 100644
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -248,11 +248,30 @@ def setup(app):
 #latex_use_parts = False
 
 latex_elements = {
-    'fontenc': r'\usepackage[LGR,T1]{fontenc}'
+# Sphinx documentation says
+#    Do not use this key for a latex_engine other than 'pdflatex'.
+# So commenting out this setting (FreeSerif which is currently
+# the default font used by Sphinx with xelatex has Greek support)
+#    'fontenc': r'\usepackage[LGR,T1]{fontenc}'
 }
 
 # Additional stuff for the LaTeX preamble.
 latex_elements['preamble'] = r'''
+% Fix for some missing characters (arise in author names in Release Notes)
+% TeXLive provides Harano Aji Mincho font but unfortunately it is missing
+% the 业 (U+4E1A).  So using SimSun which is among my system fonts.
+% Successfull build of numpy-user.pdf checked with numpy 1.24.1
+\newfontfamily\ChineseFont{SimSun}
+\catcode`琴\active\protected\def琴{{\ChineseFont\string琴}}
+\catcode`春\active\protected\def春{{\ChineseFont\string春}}
+\catcode`鈴\active\protected\def鈴{{\ChineseFont\string鈴}}
+\catcode`猫\active\protected\def猫{{\ChineseFont\string猫}}
+\catcode`傅\active\protected\def傅{{\ChineseFont\string傅}}
+\catcode`立\active\protected\def立{{\ChineseFont\string立}}
+\catcode`业\active\protected\def业{{\ChineseFont\string业}}
+\catcode`(\active\protected\def({{\ChineseFont\string(}}
+\catcode`)\active\protected\def){{\ChineseFont\string)}}
+
 % In the parameters section, place a newline after the Parameters
 % header
 \usepackage{xcolor}

fixes the reported problem. Unfortunately as indicated in commented part, my LaTeX installation has only one Chinese supporting font and it is lacking one of the ideograms. Perhaps I did not install a complete TeXLive. So I used a system font coming with my mac os and I have no idea if this will work on your build system.

I also tried using xeCJK and \setCJKmainfont but I hit against the polyglossia-xeCJK incompatibility which is commented at jgm/pandoc#7509 and at (old thread but apparently still accurate) https://tex.stackexchange.com/questions/36878/xecjk-messes-with-punctuation: the curly quotes were mis-interpreted due to xeCJK package and induced extra horizontal whitespace.

For time being I will make only a PR for the 'fontenc' key part. It should not be used with latex engine set to xelatex. Or perhaps I am missing something which would justify its presence.

Here is how it now renders for the 1.21.4 release notes:
Capture d’écran 2023-01-05 à 13 19 33

@jfbu
Copy link
Contributor Author

jfbu commented Jan 5, 2023

All duplicate references complaints from the PDF build of numpy-ref.pdf are instances of sphinx-doc/sphinx#11093

doc/source/reference/maskedarray.generic.rst

.. currentmodule:: numpy.ma

.. _maskedarray.generic:

.. module:: numpy.ma

doc/source/reference/routines.fft.rst:

.. _routines.fft:
.. automodule:: numpy.fft

doc/source/reference/routines.linalg.rst:

.. _routines.linalg:

.. module:: numpy.linalg

doc/source/reference/random/index.rst:

.. _numpyrandom:

.. py:module:: numpy.random

doc/source/reference/typing.rst:

.. _typing:
.. automodule:: numpy.typing

(the complaints are only LaTeX warnings, ending up in the console output and log, which latexmk summarizes at the end, but it does not prevent latexmk from doing the needed number of runs)

@jfbu
Copy link
Contributor Author

jfbu commented Feb 7, 2023

Unfortunately as indicated in commented part, my LaTeX installation has only one Chinese supporting font and it is lacking one of the ideograms. Perhaps I did not install a complete TeXLive.

Indeed, I was testing on a partial TeXLive. On a complete TeXLive 2022 I found fonts supporting all characters:

$ albatross -b0 -d -t 琴 春 鈴 猫 傅 立 业 ( ) | grep texlive
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolFang-Regular.otf
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolHei-Regular.otf
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolHei-Bold.otf
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolKai-Regular.otf
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolSong-Bold.otf
.../texlive/2022/texmf-dist/fonts/opentype/public/fandol/FandolSong-Regular.otf
(output has been edited to keep only relevant part of paths)

The above is done with albatross 0.5.0
and confirmed with a test build using xelatex with \setmainfont{FandolSong-Regular}[Extension=.otf] and a latex file only with those characters.

So one can adapt #22930 (comment) to use one of these fonts (each character can be picked on a chosen font, but as we have some font supporting all problematic characters and potentially further ones popping up in future, simpler to pick one).

@charris
Copy link
Member

charris commented Feb 7, 2023

If you have fixes that should be added, go ahead and make a PR.

jfbu added a commit to jfbu/numpy that referenced this issue Feb 7, 2023
Unfortunately, there is no mechanism provided by (Xe)LaTeX to
automatically fall-back on some rescue font when the main document font
does not support a character.  But one can configure each problematic
character to use a specific font, and the FandolSong font from
TeXLive-based LaTeX distributions provides for all currently needed such
problematic characters.

An un-needed and even possibly detrimental usage of the latex_elements
'fontenc' key is removed in passing.

Close: numpy#22930
jfbu added a commit to jfbu/numpy that referenced this issue Feb 7, 2023
Unfortunately, there is no mechanism provided by (Xe)LaTeX to
automatically fall-back on some rescue font when the main document font
does not support a character.  But one can configure each problematic
character to use a specific font, and the FandolSong font from
TeXLive-based LaTeX distributions provides for all currently needed such
problematic characters.

Close: numpy#22930
ninousf pushed a commit to ninousf/numpy that referenced this issue Mar 10, 2023
Unfortunately, there is no mechanism provided by (Xe)LaTeX to
automatically fall-back on some rescue font when the main document font
does not support a character.  But one can configure each problematic
character to use a specific font, and the FandolSong font from
TeXLive-based LaTeX distributions provides for all currently needed such
problematic characters.

Close: numpy#22930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants