-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of missing newlines before endstream
marker
#2523
Comments
@stefan6419846 |
Yes, the xobject has a length:
|
fixes py-pdf#2523 situation met: * length field is not correct * xref may contains not ordered stream datas * xref contains some free entries (i.e. not contains stream offset)
after analysis of the file, the length value is not valid, and the xref table was containing such data (some free entries present and possibly mixed up) that the software was not properly calculating the window to extract the stream. I've built a PR to fix it. |
Fixes #2523. Situation met: * Length field is not correct * xref may contains not ordered stream data * xref contains some free entries (i.e. does not contain stream offset)
Fixes #2523 Situation met: * length field is not correct * xref may contain unordered stream data * xref contains some free entries (i.e. does not contain stream offset)
I just stumbled upon some odd PDF files (generated by Microsoft Word for Microsoft 365 in 2022) where there would be missing newlines before the
endstream
marker.It seems like neither Ghostscript nor Poppler like this behavior, while pdf.js does indeed. For this reason, I am not sure whether we consider this something which we want/should fix on our side or not.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
I have no public reproducer for this, but in theory I would consider this rather easy to reproduce with any crafted PDF file which uses a snippet like this:
Traceback
This is the complete traceback I see (lines might be slightly off due to debugging purposes):
The text was updated successfully, but these errors were encountered: