Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full unicode display and copy/paste support #257

Closed
mintty opened this issue Jul 12, 2021 · 6 comments
Closed

full unicode display and copy/paste support #257

mintty opened this issue Jul 12, 2021 · 6 comments

Comments

@mintty
Copy link

mintty commented Jul 12, 2021

As an attempt to fix foliojs/pdfkit#1251, I came up with the test program below.
It produces PDF output which looks like the second section below.
Selecting all text in the PDF and copy/paste into a text file yields the result in the third section below.
Problems are:

  • The program needs to care about switching font according to different glyph coverage. I'd hope for some automatic font choice/fallback mechanism to cover all characters as needed.
  • Missing glyphs are displayed as a hollow box replacement symbol ▯ due to the problem before. However, even if they are selected and transferred with copy/paste, they are not reproduced as intended but all appear as U+100000. Transparent copy/paste round-trip should be accomplished, whether the glyph can be displayed or not.
  • Note the "Hællœ" pasted as "Hælloe": the œ ligature is pasted as two separate characters oe, unlike the æ ligature. This is when I ran the program on Windows. Same program on Linux pastes œ back correctly; both with pdfkit 0.12.1.

const PDFDocument = require('pdfkit')
const fs = require('fs')

let doc = new PDFDocument
doc.pipe(fs.createWriteStream('pdfkit.pdf'))
doc.registerFont('normal', './NotoSans-Regular.ttf')
doc.registerFont('emojis', './NotoEmoji-Regular.ttf')
// this one does not work:
doc.registerFont('NotoColorEmoji', './NotoColorEmoji_WindowsCompatible.ttf')

doc.font('normal')
doc.text('Hællœ 1€')
doc.text('Greek, Cyrillic: αγΩЭ')
doc.text('CJK: 啕')
doc.text('4 BMP emojis:')

doc.font('emojis')
doc.text('⛔⛱⛲✅')

doc.font('normal')
doc.text('5 non-BMP characters:')
doc.text('𐌸𐐀𑁍𝄞𝔸')
doc.text('3 non-BMP emojis:')

doc.font('emojis')
doc.text('🌛🍅😀')

doc.end()


Hællœ 1€
Greek, Cyrillic: αγΩЭ
CJK: ▯
4 BMP emojis:
⛔▯⛲✅
5 non-BMP characters:
▯▯▯▯▯
3 non-BMP emojis:
🌛🍅😀


Hælloe 1€
Greek, Cyrillic: αγΩЭ
CJK: 􀀀
4 BMP emojis:
⛔􀀀⛲✅
5 non-BMP characters:
􀀀􀀀􀀀􀀀􀀀
3 non-BMP emojis:
🌛🍅😀

@blikblum
Copy link
Member

blikblum commented Jul 12, 2021

* The program needs to care about switching font according to different glyph coverage. I'd hope for some automatic font choice/fallback mechanism to cover all characters as needed.

There's a feature request: foliojs/pdfkit#201

@mined
Copy link

mined commented Jul 15, 2021

The "Hællœ" pasted as "Hælloe" issue seems to depend on the PDF viewer, so forget about this one here.
However, transparent pasting of all characters, whether displayable or not, is essential for certain applications.

@devongovett
Copy link
Member

This sounds like a pdfkit problem not a fontkit one

@mintty
Copy link
Author

mintty commented Jul 15, 2021

The fontkit issue was closed, so we're back here...
Problem: Glyphs not available in the font are neither displayed (▯) nor can they be copied and pasted back transparently (which is however an important feature in certain applications).
The generated PDF contains the following in the affected cases:

[<000d00160017001100050000> 0] TJ
The last character output is 0000

1 beginbfrange
<0000> <0006> [<0000> <26d4> <26f2> <2705> <d83c df1b> <d83c df45> <d83d de00>]
endbfrange

The 0000 is mapped to <0000> for copy/paste.

@Pomax
Copy link
Contributor

Pomax commented Jul 15, 2021

Okay but that's still a pdfkit issue, no? Fontkit has nothing to do with whether or not you can copy text, or how it's presented. It just shapes unicode sequence. If the error is "fontkit isn't rendering .notdef for unknown glyphs" then that's a good issue for here, but otherwise this has nothing to do with fontkit itself?

@mintty
Copy link
Author

mintty commented Jul 16, 2021

Actually my comment should have gone to the pdfkit issue, sorry. Fixed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants