Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfectly kosher bug :) #255

Closed
stribor opened this issue Jun 7, 2024 · 13 comments · Fixed by #257
Closed

Perfectly kosher bug :) #255

stribor opened this issue Jun 7, 2024 · 13 comments · Fixed by #257

Comments

@stribor
Copy link

stribor commented Jun 7, 2024

I stumbled on a really odd one, on macOS Helvetica Neue ttc font has glyph with ID 65535 (not sure how/why and if other fonts have this problem) and then following line overflows unsigned short:

mSubsetFontGlyphsCount = subsetGlyphIDs.back() + 1;

Something like:
mSubsetFontGlyphsCount = subsetGlyphIDs.back() == 65535 ? 65535 : subsetGlyphIDs.back() + 1;
fixes problem, but maybe there is a better solution?

@galkahana
Copy link
Owner

Thank you. If 65535 is an eligible id then i should probably not be using ushort, but rather a larger unsigned. Ill check. Thanks.

@galkahana
Copy link
Owner

galkahana commented Jun 7, 2024

And in any case check the number and not cause an overflow.

@galkahana
Copy link
Owner

im adding checks against glyphid 0xffff here - #256.
maxp numglyphs is defined as uint16, meaning the max values is 0xffff, and so max glyph index is oxffff-1.

so now...let's see how you got a glyph id with 65535. checked on my own mac each of the different 13 fonts in macOS Helvetica Neue ttc and im not seeing glyph id 65535. all have much lower counts (around 2K glyphs).

i wonder if you may possible provide your own copy...or even better...and example script to show the problem. especially cause the correction im proposing is aiming to block such a case rather than resolve it.

@stribor
Copy link
Author

stribor commented Jun 10, 2024

Maybe this isn't an issue anymore and was fixed. I was updating our app to the latest version of PDF Writer.
And I was going through changes, had years ago it and seen that in git commit from 2021. Unfortunately, I don't have the original bug report anymore, and I can't reproduce the problem anymore.
I tried to look for documents I had from around the time.

But there is something odd with embedding (maybe it should go to different bug report). While trying to reproduce glyph id with 65535 and playing with Helvetica Neue.ttc (the one that comes with macOS in /Library/Fonts/) I noticed lots of square boxes instead of text when using PageContentContext::Tj(const GlyphUnicodeMappingList& inText)

I had to resurrect my hack/workaround from 2017:

bool WrittenFontTrueType::AddToANSIRepresentation(	const GlyphUnicodeMappingList& inGlyphsList,
													UShortList& outEncodedCharacters)
{
#if 1
	// Always disable ANSI representation so it will fallback to CID so ligatures won't crash it and some text would be wrongly displayed (I guess pray part doesn't work in all cases:) )
	return false;
#else
...

@galkahana
Copy link
Owner

galkahana commented Jun 11, 2024

perhaps. with source font file and script to recreate the problem, a-la one of the tests so i can recreate, figure out what's wrong and correct. That would be lovely

@stribor
Copy link
Author

stribor commented Jun 11, 2024

Font file comes with macOS:
/System/Library/Fonts/HelveticaNeue.ttc
Version is 17.0d2e1

I copied it along other fonts to PDFWriterTesting/Materials/fonts/

and modified CIDSetWritingCFF test:

Index: PDFWriterTesting/CIDSetWritingCFF.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriterTesting/CIDSetWritingCFF.cpp b/PDFWriterTesting/CIDSetWritingCFF.cpp
--- a/PDFWriterTesting/CIDSetWritingCFF.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriterTesting/CIDSetWritingCFF.cpp	(date 1718110943000)
@@ -52,12 +52,25 @@
 		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
 																			BuildRelativeInputPath(
                                                                             argv,
-                                                                            "fonts/KozGoPro-Regular.otf")),
+                                                                            "fonts/HelveticaNeue.ttc")),
 																			14,
 																			AbstractContentContext::eGray,
 																			0);
 
-		cxt->WriteText(75,600,"hello \xe6\x9c\x9d",textOptions);
+		cxt->WriteText(75,600,"Helvetica Neue",textOptions);
+
+		GlyphUnicodeMappingList guml;
+		guml.push_back(GlyphUnicodeMapping(47, 72));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		guml.push_back(GlyphUnicodeMapping(83, 108));
+		guml.push_back(GlyphUnicodeMapping(93, 118));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		cxt->BT();
+		cxt->k(0,0,0,1);
+		cxt->Tm(1,0,0,1,10,600);
+		cxt->Tf(textOptions.font, 14);
+		cxt->Tj(guml);
+		cxt->ET();
 
 		status = pdfWriter.EndPageContentContext(cxt);
 		if(status != eSuccess)

With this I get CIDSetWritingCFF_broken.pdf with no text visible, and if I do:

Index: PDFWriter/WrittenFontTrueType.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriter/WrittenFontTrueType.cpp b/PDFWriter/WrittenFontTrueType.cpp
--- a/PDFWriter/WrittenFontTrueType.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriter/WrittenFontTrueType.cpp	(date 1718111392000)
@@ -54,6 +54,7 @@
 	// i'll need to make something different out of the text.
 	// as you can see this has little to do with glyphs (mainly cause i can't use FreeType to map the glyphs
 	// back to the rleevant unicode values...but no need anyways...that's why i carry the text).
+	return false;
 	UShortList candidates;
 	BoolAndByte encodingResult(true,0);
 	WinAnsiEncoding winAnsiEncoding;

I get CIDSetWritingCFF_readable.pdf

CIDSetWritingCFF_readable.pdf
CIDSetWritingCFF_broken.pdf

HelveticaNeue.ttc.zip

@galkahana
Copy link
Owner

awesome. thanks man. i'll try to look at it soon. maybe i @#$@#$ up the ansi path from glyphs. it's neat to see all these 14yrs old bugs coming up haha. thanks!

@galkahana
Copy link
Owner

oh my. read the pdf specs again and some of ttf. bottom line i need to verify that the font has particular cmaps defined in it (ms unicode or Mac Roman) otherwise i cant use winansi encoding and have to revert to cid rep. or yknow i could just not bother with attempting a winansi implementation at all. which is what your workaround does. i reckon i'll try to stick with winansi but verify the existance of those cmaps and only then do ansi. otherwise cid. this means that in neue case, which doesn't have this cmap, it'll go to CID, while in arial.ttf thats in the samples materials (which has both) will get to keep ansi.

ok...i'll need a couple of more days to get to write this code, but shouldn't be long.

@galkahana
Copy link
Owner

closed automatically cause i reffed from PR. anyways. i think it should be fine now

@galkahana
Copy link
Owner

Gonna close this. If theres anything else we can reopen

@stribor
Copy link
Author

stribor commented Jul 12, 2024

I got bug report from user about same broken behavior with https://fonts.google.com/specimen/Vollkorn font, once downloaded I used ttf's from static folder. (Side question, any way of supporting variable fonts?).

PDF gets generated fine when FontHasCmapsForWinAnsiEncoding returns false

@galkahana
Copy link
Owner

hmm. i wasn't able to recreate the problem. tried Vollkorn-reguler.ttf with simple WriteText("Vollkorn regular") and it came out fine. did verify im getting an ansi writing in the result pdf. maybe there's something more specific with a particular glyphs usage? if you can find out more details we can try to figure this one out.

RE variable fonts. in don't know much about them...so i don't really know if they can be used as is or whether there's a simple workaround to make the lib read them as plain ttf/otf/type1/dfont.

@stribor
Copy link
Author

stribor commented Jul 14, 2024

I tried to narrow it down to be able to reproduce it and what triggers it seems specifying glyphs

i.e. above example for Helvetica, with Vollkorn font:

		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
										BuildRelativeInputPath(
                                                                            argv,
                                                                            "fonts/Vollkorn-Regular.ttf")),
																			14,
																			AbstractContentContext::eGray,
																			0);

		cxt->WriteText(75,600,"Rüc Res",textOptions);

		GlyphUnicodeMappingList guml;
		guml.push_back(GlyphUnicodeMapping(256, 82));
		guml.push_back(GlyphUnicodeMapping(505, 252));
		guml.push_back(GlyphUnicodeMapping(321, 99));
		cxt->BT();
		cxt->k(0,0,0,1);
		cxt->Tm(1,0,0,1,10,600);
		cxt->Tf(textOptions.font, 14);
		cxt->Tj(guml);
		cxt->ET();

		status = pdfWriter.EndPageContentContext(cxt);

breaks output if FontHasCmapsForWinAnsiEncoding(FT_Face font) returns true;

also note that: cxt->Tj(guml); affect also output of WriteText.

I mostly use AbstractContentContext::TJ and AbstractContentContext::Tj for text output and once problem is triggered I get whole text even following pages wrongly output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants