Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font fallback #637

Closed
CY-Qiu opened this issue Dec 22, 2022 · 7 comments
Closed

Font fallback #637

CY-Qiu opened this issue Dec 22, 2022 · 7 comments

Comments

@CY-Qiu
Copy link

CY-Qiu commented Dec 22, 2022

Please explain your intent
Both windows MFC and HTML/CSS have font fallback algorithm. This is very important for CJK font. I wonder if fpdf2 could have this feature.

example: We want add a word "𡉃" to output file.

In HTML/CSS, we can add a style: fontFamily = "'Noto Serif CJK SC', 'SimSun-ExtB'".('Noto Serif CJK SC' is developed by Google and published on 'https://github.com/googlefonts/noto-cjk'. 'SimSun-ExtB' is default installed by Windows OS Simplified Chinese version)

this word only exist on 'SimSun-ExtB' font, 'Noto Serif CJK SC' font does not have this word. When the web browser find out 'Noto Serif CJK SC' does not have this word, the web browser would use 'SimSun-ExtB' automatically based on if the font have this word.

However, fpdf2 does not have this feature. if we use 'Noto Serif CJK SC' font, we cannot get "𡉃" displayed. 'SimSun-ExtB' font is very old, many unicode word is not in this font, so we cannot just use 'SimSun-ExtB'.

Describe the solution you'd like
This can be researched.

@Lucas-C
Copy link
Member

Lucas-C commented Dec 22, 2022

Hi @CY-Qiu

Thank you for opening a discussion about this feature idea.

For reference:

Regarding your suggestion, I see two main approaches to implement this "font fallback" mechanism:

  • at the PDF level: I checked the 1.7 PDF spec and this does not seem possible. Fonts are embedded in PDF files, and the PDF spec defines in a unique way how a given text character should be displayed by PDF viewers, with a specific font.

  • at the Python lib level: fpdf2 could offer a fallback mechanism that could work like this:

    1. For every text character rendered by the lib, check if it exists in the selected font
    2. If it does not, switch to a "fallback" font to render it, by inserting in the PDF page content stream the appropriate operators (Tf)

This is certainly doable, but would require some refactoring in FPDF._render_styled_text_line().

A starting point would be to first implement step 1. and detect when characters are not "present" in the selected font.
I would welcome a PR that introduces warnings when such case happens.

Finally, another alternative to this "fallback" mechanism can simply be to craft your own font files combining characters from several fonts.

@gmischler
Copy link
Collaborator

This is certainly doable, but would require some refactoring in FPDF._render_styled_text_line()

I think any such check/substitution would have to happen much earlier. FPDF._render_styled_text_line() already has too many different tasks to solve, many of which should be delegated elsewhere rather than adding new ones.

The natural place to do something like that is when the text fragments are created.

But there are quite a number of questions that we'd need to answer first:

  • Which font are we going to use?
    For Latin-1 text, one of the built-in ones would work.
    For any other characters, we'd need a seperate Unicode font with the largest possible coverage.
    Note that eg. Firefox allows to configure seperate default substitution fonts for 29 (!) different writing systems, so an actually complete solution to the problem is clearly not trivial at all.

  • Do we want to include a full Unicode font with fpdf2, or would we just offer the user a possibility to define a default font?
    A Unicode font with maximal character coverate will be dozens of MBs large.

  • Alternatively, a lot of software just uses some placeholder shape (eg. a rectangle) for unknown glyphs.
    This may be a valid strategy, and possibly better than not showing anything at all or throwing an error.
    Note that web browsers are forced to use font substitution, because a HTML author has no obligation to specify a font. This is very different from our situation. Since PDF is at its core a print layout format, its goal is to produce an exact representation of the graphical page, which is why specifying a font is mandatory. Do we really want to encourage the use of bad/outdated/incomplete fonts by producing something "legible but ugly" by means of automatic substitution? Since our target audience are not end users but developers, I think we should be a bit more strict than that.

@Lucas-C
Copy link
Member

Lucas-C commented Dec 22, 2022

Since PDF is at its core a print layout format, its goal is to produce an exact representation of the graphical page, which is why specifying a font is mandatory. Do we really want to encourage the use of bad/outdated/incomplete fonts by producing something "legible but ugly" by means of automatic substitution? Since our target audience are not end users but developers, I think we should be a bit more strict than that.

I fully agree.

I think we could just support fallback Latin-1 characters with a builtin font: https://github.com/PyFPDF/fpdf2/blob/2.6.0/fpdf/fpdf.py#L285

This seems relatively simple, could be useful for fpdf2 users, mainly as a helpful "debug" mechanism,
but also as a small-cost workaround in some cases,
while preserving some overall "strict" approach to font selection.

Also this "fallback to render unupported chars with a standard font" mechanism would have to be enabled explicitely.

What do you think about this @gmischler & @CY-Qiu?
As usual, no promise made here, this feature will be supported only if some programmer is willing to take the time to submit a PR implementing it.

@Lucas-C
Copy link
Member

Lucas-C commented Jan 2, 2023

Hi @CY-Qiu

Unless you have more to add to this feature suggestion, I'm probably going to close this discussion.

Thank you anyway for opening this exchange 😃

@andersonhc
Copy link
Collaborator

Font fallback is also an issue for a project I'm working on, where users can choose their name, and I have plenty of names like 𝕥𝕙𝕚𝕤, 🆃🅷🅸🆂 and emojis that can't be rendered on the report.

I explored the option of getting all the possible glyphs on a single font, but fonts can't grow indefinitely - there's a hard limit.

An ideal scenario for me would be allowing the user to set an array of fonts on set_font(). FPDF2 will use the first font, and if a glyph is not present, start looking for the glyph on the other fonts in the array.

Something like:
pdf.add_font("NotoSans.ttf") #latin characters
pdf.add_font("NotoSansCJK.ttc") #chinese, japanese and korean
pdf.add_font("twemoji.ttf") #twitter emoji font
pdf.set_font(["NotoSans", "NotoSansCJK", "twemoji"])

I believe it could be processed in fragments, similarly to how markdown does.
If you're OK with that approach, let me know. It's something I'm interested in working on.

@Lucas-C
Copy link
Member

Lucas-C commented Jan 5, 2023

Font fallback is also an issue for a project I'm working on, where users can choose their name, and I have plenty of names like 𝕥𝕙𝕚𝕤, 🆃🅷🅸🆂 and emojis that can't be rendered on the report.

Note that you can easily convert a fancy character like 𝕥 to a simple "t":

unicodedata.normalize('NFKD', '𝕥')  # ref: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization

But I see how one would like to preserve the original format of the names that users provided as input.

After experimenting a bit, I found some fonts able to render 𝕥𝕙𝕚𝕤 & 🆃🅷🅸🆂 :

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.add_font(fname="DejaVuSans.ttf")
pdf.add_font(fname="NotoSansSymbols.ttf")
pdf.add_font(fname="Segoe-UI-Symbol.ttf")
pdf.set_font('DejaVuSans')
pdf.cell(txt="𝕥𝕙𝕚𝕤")
pdf.set_font('NotoSansSymbols')
pdf.cell(txt="🆃🅷🅸🆂")
pdf.set_font('Segoe-UI-Symbol')
pdf.cell(txt="🆃🅷🅸🆂")
pdf.output("fonts.pdf")

Produces: fonts.pdf

In this scenario, I see how not having to specify what font to use to render every character could be useful.

I think that if we want to support something like what you suggested,
as I mentioned previously, fpdf2 will need to:

For every text character rendered by the lib, check if it exists in the selected font

A PR implementing this kind of detection mechanism would be welcome as a first step towards a font fallback mechanism.

@Lucas-C
Copy link
Member

Lucas-C commented Mar 3, 2023

Font fallback has been implemented by @andersonhc in #669

It can currently be tested by intalling fpdf2 from git:

pip install git+https://github.com/PyFPDF/fpdf2.git@master

A new release of fpdf2 will be made soon, including this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants