Perfectly kosher bug :) #255

stribor · 2024-06-07T13:04:10Z

I stumbled on a really odd one, on macOS Helvetica Neue ttc font has glyph with ID 65535 (not sure how/why and if other fonts have this problem) and then following line overflows unsigned short:

PDF-Writer/PDFWriter/TrueTypeEmbeddedFontWriter.cpp

Line 157 in 4b59b82

mSubsetFontGlyphsCount = subsetGlyphIDs.back() + 1;

Something like:
mSubsetFontGlyphsCount = subsetGlyphIDs.back() == 65535 ? 65535 : subsetGlyphIDs.back() + 1;
fixes problem, but maybe there is a better solution?

The text was updated successfully, but these errors were encountered:

galkahana · 2024-06-07T13:42:26Z

Thank you. If 65535 is an eligible id then i should probably not be using ushort, but rather a larger unsigned. Ill check. Thanks.

galkahana · 2024-06-07T13:42:48Z

And in any case check the number and not cause an overflow.

galkahana · 2024-06-08T13:00:06Z

im adding checks against glyphid 0xffff here - #256.
maxp numglyphs is defined as uint16, meaning the max values is 0xffff, and so max glyph index is oxffff-1.

so now...let's see how you got a glyph id with 65535. checked on my own mac each of the different 13 fonts in macOS Helvetica Neue ttc and im not seeing glyph id 65535. all have much lower counts (around 2K glyphs).

i wonder if you may possible provide your own copy...or even better...and example script to show the problem. especially cause the correction im proposing is aiming to block such a case rather than resolve it.

stribor · 2024-06-10T20:58:29Z

Maybe this isn't an issue anymore and was fixed. I was updating our app to the latest version of PDF Writer.
And I was going through changes, had years ago it and seen that in git commit from 2021. Unfortunately, I don't have the original bug report anymore, and I can't reproduce the problem anymore.
I tried to look for documents I had from around the time.

But there is something odd with embedding (maybe it should go to different bug report). While trying to reproduce glyph id with 65535 and playing with Helvetica Neue.ttc (the one that comes with macOS in /Library/Fonts/) I noticed lots of square boxes instead of text when using PageContentContext::Tj(const GlyphUnicodeMappingList& inText)

I had to resurrect my hack/workaround from 2017:

bool WrittenFontTrueType::AddToANSIRepresentation(	const GlyphUnicodeMappingList& inGlyphsList,
													UShortList& outEncodedCharacters)
{
#if 1
	// Always disable ANSI representation so it will fallback to CID so ligatures won't crash it and some text would be wrongly displayed (I guess pray part doesn't work in all cases:) )
	return false;
#else
...

galkahana · 2024-06-11T03:11:10Z

perhaps. with source font file and script to recreate the problem, a-la one of the tests so i can recreate, figure out what's wrong and correct. That would be lovely

stribor · 2024-06-11T13:15:35Z

Font file comes with macOS:
/System/Library/Fonts/HelveticaNeue.ttc
Version is 17.0d2e1

I copied it along other fonts to PDFWriterTesting/Materials/fonts/

and modified CIDSetWritingCFF test:

Index: PDFWriterTesting/CIDSetWritingCFF.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriterTesting/CIDSetWritingCFF.cpp b/PDFWriterTesting/CIDSetWritingCFF.cpp
--- a/PDFWriterTesting/CIDSetWritingCFF.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriterTesting/CIDSetWritingCFF.cpp	(date 1718110943000)
@@ -52,12 +52,25 @@
 		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
 																			BuildRelativeInputPath(
                                                                             argv,
-                                                                            "fonts/KozGoPro-Regular.otf")),
+                                                                            "fonts/HelveticaNeue.ttc")),
 																			14,
 																			AbstractContentContext::eGray,
 																			0);
 
-		cxt->WriteText(75,600,"hello \xe6\x9c\x9d",textOptions);
+		cxt->WriteText(75,600,"Helvetica Neue",textOptions);
+
+		GlyphUnicodeMappingList guml;
+		guml.push_back(GlyphUnicodeMapping(47, 72));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		guml.push_back(GlyphUnicodeMapping(83, 108));
+		guml.push_back(GlyphUnicodeMapping(93, 118));
+		guml.push_back(GlyphUnicodeMapping(76, 101));
+		cxt->BT();
+		cxt->k(0,0,0,1);
+		cxt->Tm(1,0,0,1,10,600);
+		cxt->Tf(textOptions.font, 14);
+		cxt->Tj(guml);
+		cxt->ET();
 
 		status = pdfWriter.EndPageContentContext(cxt);
 		if(status != eSuccess)

With this I get CIDSetWritingCFF_broken.pdf with no text visible, and if I do:

Index: PDFWriter/WrittenFontTrueType.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/PDFWriter/WrittenFontTrueType.cpp b/PDFWriter/WrittenFontTrueType.cpp
--- a/PDFWriter/WrittenFontTrueType.cpp	(revision 4b59b8240bd5f7ea75af68e77ef9a53c050bfae5)
+++ b/PDFWriter/WrittenFontTrueType.cpp	(date 1718111392000)
@@ -54,6 +54,7 @@
 	// i'll need to make something different out of the text.
 	// as you can see this has little to do with glyphs (mainly cause i can't use FreeType to map the glyphs
 	// back to the rleevant unicode values...but no need anyways...that's why i carry the text).
+	return false;
 	UShortList candidates;
 	BoolAndByte encodingResult(true,0);
 	WinAnsiEncoding winAnsiEncoding;

I get CIDSetWritingCFF_readable.pdf

CIDSetWritingCFF_readable.pdf
CIDSetWritingCFF_broken.pdf

HelveticaNeue.ttc.zip

galkahana · 2024-06-11T13:32:27Z

awesome. thanks man. i'll try to look at it soon. maybe i @#$@#$ up the ansi path from glyphs. it's neat to see all these 14yrs old bugs coming up haha. thanks!

galkahana · 2024-06-12T19:08:02Z

oh my. read the pdf specs again and some of ttf. bottom line i need to verify that the font has particular cmaps defined in it (ms unicode or Mac Roman) otherwise i cant use winansi encoding and have to revert to cid rep. or yknow i could just not bother with attempting a winansi implementation at all. which is what your workaround does. i reckon i'll try to stick with winansi but verify the existance of those cmaps and only then do ansi. otherwise cid. this means that in neue case, which doesn't have this cmap, it'll go to CID, while in arial.ttf thats in the samples materials (which has both) will get to keep ansi.

ok...i'll need a couple of more days to get to write this code, but shouldn't be long.

galkahana · 2024-06-14T08:47:27Z

closed automatically cause i reffed from PR. anyways. i think it should be fine now

galkahana · 2024-06-17T05:46:14Z

Gonna close this. If theres anything else we can reopen

stribor · 2024-07-12T09:23:25Z

I got bug report from user about same broken behavior with https://fonts.google.com/specimen/Vollkorn font, once downloaded I used ttf's from static folder. (Side question, any way of supporting variable fonts?).

PDF gets generated fine when FontHasCmapsForWinAnsiEncoding returns false

galkahana · 2024-07-13T09:23:55Z

hmm. i wasn't able to recreate the problem. tried Vollkorn-reguler.ttf with simple WriteText("Vollkorn regular") and it came out fine. did verify im getting an ansi writing in the result pdf. maybe there's something more specific with a particular glyphs usage? if you can find out more details we can try to figure this one out.

RE variable fonts. in don't know much about them...so i don't really know if they can be used as is or whether there's a simple workaround to make the lib read them as plain ttf/otf/type1/dfont.

stribor · 2024-07-14T18:32:08Z

I tried to narrow it down to be able to reproduce it and what triggers it seems specifying glyphs

i.e. above example for Helvetica, with Vollkorn font:

		AbstractContentContext::TextOptions textOptions(pdfWriter.GetFontForFile(
										BuildRelativeInputPath(
                                                                            argv,
                                                                            "fonts/Vollkorn-Regular.ttf")),
																			14,
																			AbstractContentContext::eGray,
																			0);

		cxt->WriteText(75,600,"Rüc Res",textOptions);

		GlyphUnicodeMappingList guml;
		guml.push_back(GlyphUnicodeMapping(256, 82));
		guml.push_back(GlyphUnicodeMapping(505, 252));
		guml.push_back(GlyphUnicodeMapping(321, 99));
		cxt->BT();
		cxt->k(0,0,0,1);
		cxt->Tm(1,0,0,1,10,600);
		cxt->Tf(textOptions.font, 14);
		cxt->Tj(guml);
		cxt->ET();

		status = pdfWriter.EndPageContentContext(cxt);

breaks output if FontHasCmapsForWinAnsiEncoding(FT_Face font) returns true;

also note that: cxt->Tj(guml); affect also output of WriteText.

I mostly use AbstractContentContext::TJ and AbstractContentContext::Tj for text output and once problem is triggered I get whole text even following pages wrongly output.

galkahana mentioned this issue Jun 8, 2024

adding safeguards so input of oxffff glyph id isn't problematic #256

Merged

galkahana mentioned this issue Jun 14, 2024

bug: not all true types can use winansiencoding. need to check cmaps #257

Merged

galkahana closed this as completed in #257 Jun 14, 2024

galkahana reopened this Jun 14, 2024

galkahana closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfectly kosher bug :) #255

Perfectly kosher bug :) #255

stribor commented Jun 7, 2024

galkahana commented Jun 7, 2024

galkahana commented Jun 7, 2024 •

edited

Loading

galkahana commented Jun 8, 2024

stribor commented Jun 10, 2024

galkahana commented Jun 11, 2024 •

edited

Loading

stribor commented Jun 11, 2024

galkahana commented Jun 11, 2024

galkahana commented Jun 12, 2024

galkahana commented Jun 14, 2024

galkahana commented Jun 17, 2024

stribor commented Jul 12, 2024

galkahana commented Jul 13, 2024

stribor commented Jul 14, 2024

Perfectly kosher bug :) #255

Perfectly kosher bug :) #255

Comments

stribor commented Jun 7, 2024

galkahana commented Jun 7, 2024

galkahana commented Jun 7, 2024 • edited Loading

galkahana commented Jun 8, 2024

stribor commented Jun 10, 2024

galkahana commented Jun 11, 2024 • edited Loading

stribor commented Jun 11, 2024

galkahana commented Jun 11, 2024

galkahana commented Jun 12, 2024

galkahana commented Jun 14, 2024

galkahana commented Jun 17, 2024

stribor commented Jul 12, 2024

galkahana commented Jul 13, 2024

stribor commented Jul 14, 2024

galkahana commented Jun 7, 2024 •

edited

Loading

galkahana commented Jun 11, 2024 •

edited

Loading