-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Throw PdfReadError if Trailer can't be read #1298
Conversation
… Added myself to contributor list.
Codecov Report
@@ Coverage Diff @@
## main #1298 +/- ##
=======================================
Coverage 95.02% 95.02%
=======================================
Files 30 30
Lines 4986 4988 +2
Branches 1025 1026 +1
=======================================
+ Hits 4738 4740 +2
Misses 141 141
Partials 107 107
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Can you please add a test that covers this? You mentioned that https://github.com/p-sampson-II/SEEDLab-TAs-2022/blob/378e6f583ed9de274a8bda9240dfa7aa6872a733/Alex%20Curtis/Datasheets/Rotary%20Potentiometer.pdf shows the issue. You can use the file via: url = "https://github.com/p-sampson-II/SEEDLab-TAs-2022/blob/378e6f583ed9de274a8bda9240dfa7aa6872a733/Alex%20Curtis/Datasheets/Rotary%20Potentiometer.pdf"
name = "github-Rotary-Potentiometer.pdf"
reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name))) |
Removed extraneous is True statement on line 329 of _reader.py Co-authored-by: Martin Thoma <info@martin-thoma.de>
…f to test new condition
Thank you for your work on this one! I've just noticed that you've added Could you please add the file there? (I can also do it + use the "Contributed-by" feature of Github to attribute it to you) |
1 similar comment
Thank you for your work on this one! I've just noticed that you've added Could you please add the file there? (I can also do it + use the "Contributed-by" feature of Github to attribute it to you) |
…ensure accurate test results in future
@MartinThoma I've added a pull request to get that file into the correct repository and updated the file location in test_reader.py The commit may fail some tests in this repository before the sample files are merged, but I've tested it on my own rig here with pytest and it should work perfectly after the merge. |
I was just blocked at work and added the updated submodule to PyPDF2 :-) |
Could you please do |
Looks like the the reader causes some issues if the metadata is unreadable. Is it the right move to throw a PdfReadError for this situation? |
Sorry, my mistake. I'll fix the test tomorrow 🙈 |
Version 2.10.5, 2022-09-04 -------------------------- New Features (ENH): - Process XRefStm (#1297) - Auto-detect RTL for text extraction (#1309) Bug Fixes (BUG): - Avoid scaling cropbox twice (#1314) Robustness (ROB): - Fix offset correction in revised PDF (#1318) - Crop data of /U and /O in encryption dictionary to 48 bytes (#1317) - MultiLine bfrange in cmap (#1299) - Cope with 2 digit codes in bfchar (#1310) - Accept '/annn' charset as ASCII code (#1316) - Log errors during Float / NumberObject initialization (#1315) - Cope with corrupted entries in xref table (#1300) Documentation (DOC): - Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324) - Creating a coverage report (#1319) - Fix AnnotationBuilder.free_text example (#1311) - Fix usage of page.scale by replacing it with page.scale_by (#1313) Developer Experience (DEV): - Only run coverage for PyPDF2 Maintenance (MAINT): - PdfReaderProtocol (#1303) - Throw PdfReadError if Trailer can't be read (#1298) - Remove catching OverflowException (#1302) Full Changelog: 2.10.4...2.10.5
… Added myself to contributor list.
In reference to #1279 I've modified _reader.py to throw a PdfReadError if the trailer doesn't exist, can't be read, or does not point to the object containing the metadata for the document.
Three pdf files were uploaded in relation to aforementioned issue, pca_var.pdf contained no trailer, and the other two had corrupted declarations that PyPDF2 could not read. I'll include them here for those interested.
From Rotary Potentiometer.pdf:
trailer^M <</Size 267/Prev 2815238/Root 176 0 R/Info 174 0 R/ID[<8CC42D96833D2A438E62CB60D00F610F><5C480BCE388CB6408C83C951FDD0DA22>]>>^M startxref^M 0^M %%EOF^M
From Status_v1_Reviewers-Guide.pdf:
trailer
/Size
447
/Root
4
0
R
/Info
5
0
R
startxref
6128518
(sorry for using multiple code blocks, but I think it's important to show that there are line breaks for each of these lines in my terminal)
We should probably figure out why these trailers are displayed like this. If I had to guess it might be because of a different encoding for line breaks. If we can update the way that we read the trailer to account for this we might be able to read useful information out of more files.
All three files from the issue should now throw an error when the test code is run. This should close #1279