-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Showing squares with hex values instead of text in some PDFs #15289
Comments
Also it seems that some but not all of the preview PDFs in the current Essential Classic Fantasy RPG Collection bundle, not from O'Reilly, have the problem. An interesting example is the preview PDF for Let's Get Kraken where only the header text which is supposed to be "PART ONE: ADVENTURE OVERVIEW" is squares-with-hex-values, with the rest of the text readable. |
Attaching a preview here, which is hopefully OK, since this isn't really easily actionable otherwise: issue15289.pdf
All of the affected fonts are Type1/Type1C, i.e. CFF fonts.
|
@Snuffleupagus I can confirm that your uploaded example reproduces the issue for me. Please note that I said some but not all of the examples have errors in the console. Both the preview PDFs for Robust Python in the O'Reilly bundle I first linked, and Let's Get Kraken from the RPG bundle I linked in my first comment, have hex squares but no console errors. Also I looked more at the current Packt bundle here and contrary to my initial description where I said the Packt books were okay, the preview for the book Machine Learning with PyTorch and Scikit-Learn does have hex squares (and no console errors). |
About the sanitizer issue, it's very likely because of: Line 1870 in 40f9f7e
The specs say:
and a bug has been fixed in the sanitizer 13y ago: The fix for OTS is likely to replace That said I have no idea for the other issues. |
On mac OS, the rendering of page 1 is ok except for the italic font. |
I can confirm the hex squares become text with I'm not sure what |
I filed bug for Firefox: |
Thanks for the quick turnaround! For what it's worth however, though I can reproduce the hex square ("tofu", I guess) with the plop.html and plop.ttf attached to the Firefox bug, changing |
@calixteman I'm not really sure of the status of this issue. You submitted a PR (thanks!) to this repository about two weeks ago and closed this issue. The related Firefox bug is still open. That bug is marked as "Version: Firefox 105". The current stable version of Firefox is v103. Does that mean this pdf.js bug should definitely be fixed in Firefox v105? |
The fix for a subset of this issue (#15290) landed in Firefox Nightly in https://bugzilla.mozilla.org/show_bug.cgi?id=1784537. |
@marco-c I don't know any of the details of how pdf.js works or how it plugs into Firefox. As an end user the issue I reported is that when I open a subset of PDFs from a specific publisher (Humble Bundle), some of the text in those PDFs is unreadable. Looking at all these issues and bugs what I see as an end user is:
To be clear I'm not trying to rush anybody. If this were at a point of "yes, we have all the info we need, we'll get to it when we get to it" then that's fine. At the moment though there are still some open questions and uncertainty on my side whether the correct problem is being tracked, so I'd like to resolve those before leaving things alone. In fact it feels a little like the various devs here have been sidetracked on different problems but the core problem of "I can't read these PDFs" has gotten lost. |
@jeremyn there were actually different root issues affecting the PDFs you shared with us, one class of issues has been fixed as part of #15290 (which closed this issue). The rest of the issues are unrelated to pdf.js itself but are due to Firefox internal graphics engine, and these issues are tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1783740.
Until https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 is fixed, you will still be able to reproduce some (if not all) of the issues you mentioned initially.
Thanks, I'll point Jonathan to this issue. @calixteman is away, or he would have answered him. |
@marco-c Thanks. Do you have thoughts on the workaround of setting Is PDF.js/Firefox treating this as a specific problem for a few fonts or as a systemic problem? As my earlier comments say this is widespread across Humble Bundle PDFs from a variety of publishers. It would be unfortunate for me for this issue to take a long time to resolve only to find out it was some hyper-specific fix for the one sample PDF uploaded here. Also about Humble Bundle, in an earlier comment #15289 (comment) I asked here if there is some useful request I can make to their support group. If they are generating PDFs in some bad way then I can just ask them to stop. Do you have any advice about that? |
@jeremyn it seems to be related to these PDFs and not a widespread problem, it could be useful to ask them questions to answer all of @jfkthame's questions from https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 (and maybe he has more after reading this thread). |
@marco-c I created an issue with Humble Bundle support and directed them to the Bugzilla issue. I can't say what the escalation process is between their support people and whoever deals with this sort of problem on their side. I want to get out of the middle here so as far as that all goes, this is not my issue anymore. Do you have any info about |
@jeremyn I'm not familiar with that option, but if it isn't the default it must mean that it has downsides that exceed the improvements, so I would keep it to |
Chris from Humble Bundle here. It looks like our process for making pdf preview is to take the full PDF and run it through GhostScript to truncate the PDF. The full book PDFs do render just fine in Firefox, It's clear something in this process is triggering the bug in Firefox, but I'm not sure what. If I get some free time, I can try some alternate arguments for ghostscript to workaround this issue going forward. Maybe something useful in the stripped out pages is getting lost? I also looked at Bugzilla, and they've got a pretty minimal test case to demonstrate the issue as well. |
Given that the "full book PDFs do render just fine in Firefox", it appears that GhostScript is damaging the font in some way, perhaps during the process of subsetting to include only the characters present in the selected pages. I don't think this is a really a Firefox bug as such; note that https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 indicates that the font similarly fails to load in Edge. Maybe the I suppose if you can share a "full" PDF that works, along with a truncated preview (created from the same document) where the font fails, we can try to extract the corresponding font resources from each and compare them, though CFF is a fearsomely complex format and it may be hard to identify exactly what is triggering the failure. |
My issue was linked to this one so I tried to review all the comments and links here. But I'm not sure that it's the same. @Snuffleupagus points me to this comment in this thread, but my issue is not Firefox specific, also, even in the latest version of the Firefox the issue is still reproducible. |
@sergei-harbour As I understand it, this issue is only partially fixed, with the rest moved to the Bugzilla tracker. See #15289 (comment). Also, your test file is broken for me in Firefox 105.0.2 with |
True, it works, but after some research it looks like it stops working for those cases when a pdf has a non-standard font that is not embedded to the doc. It doesn't fall back to the system font. Maybe some retry logic can be a workaround here, something like:
The thing is that I deal with tons of PDF docs in my system and I can't control the way how they are created. Maybe I need to preprocess the docs somehow and replace/embed the fonts that don't play nice with pdf.js. |
After digging in the font I finally found that it's because of |
Oh, interesting! Congratulations on tracking this down. Neither the Adobe CFF specification nor the OpenType spec for CFF2 seems to give any clue what this means; they just mention a default value of 0.06, but not a word about what other values would be valid or what effect it's supposed to have. shrug Removing it in pdf.js should fix the immediate issue with rendering PDFs that contain such fonts, but we could also consider removing it in OTS, so that if a "bad" font is used as a webfont (independently of PDF embedding) it would also resolve that case. Though maybe such fonts only arise as a result of some (faulty?) PDF-generating workflows. |
Ah - looks like this is inherited from the old Type 1 spec. See page 45 in https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf for information. @calixteman I'm just wondering, if you reset the value to 0.06 (the default) instead of removing it, does that also resolve the failure? If so, maybe that would be the lowest-risk approach, just in case any rendering engine expects the entry to be present. |
I updated my PR to set the property to 0.06. |
Avoid null ExpansionFactor in type1 fonts (follow-up of #15289)
Attach (recommended) or Link to PDF file here:
Configuration:
Steps to reproduce the problem:
What is the expected behavior? (add screenshot)
What went wrong? (add screenshot)
Warning: Failed to load font 'g_d0_f3': 'SyntaxError: An invalid or illegal string was specified'. pdf.js:446:13
downloadable font: CFF : Failed to parse Global Subrs INDEX (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: CFF : Failed to parse table (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: rejected by sanitizer (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: font load failed (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
The text was updated successfully, but these errors were encountered: