-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support R6 decrypting #1015
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1015 +/- ##
==========================================
+ Coverage 89.50% 89.60% +0.09%
==========================================
Files 24 24
Lines 4432 4425 -7
Branches 919 914 -5
==========================================
- Hits 3967 3965 -2
+ Misses 318 314 -4
+ Partials 147 146 -1
Continue to review full report at Codecov.
|
You're on fire, @exiledkingcc 👍 🚀 The tests pass - is this ready from your perspective, @exiledkingcc ? |
thanks to qpdf, althoght it's comment shows the algorithm is NOT EXACTLY described in specifiction.
anyway i think you can merge this, it will not make things worse, at least, we can decrypt PDFs encypted by qpdf. |
I was just checking the PDF files. While the r2 / r3 / r4 files ask for passwords, the r5 and r6 files don't. I can directly see the content. Are you sure that they are encrypted? |
yes, they are encrypted, but not password-protected. |
this is how qpdf do about passwords, maybe we should keep the same way: |
@exiledkingcc Do you think this is ready to be merged? If yes, I would review it again this evening :-) I'm excited about it :-) |
i think yes. 😊 |
PyPDF2/_merger.py
Outdated
@@ -137,7 +137,7 @@ def merge( | |||
reader = PdfReader(stream, strict=self.strict) # type: ignore[arg-type] | |||
self.inputs.append((stream, reader, my_file)) | |||
if encryption_obj is not None: | |||
reader._encryption = encryption_obj | |||
reader.encryption = encryption_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every attribute that does not start with an underscore is part of the public interface. That means we cannot change the behavior / name without deprecation warnings.
Is there a reason why a user would want to access the encryption
attribute directly?
PyPDF2/_reader.py
Outdated
@@ -254,8 +254,26 @@ def __init__( | |||
self.stream = stream | |||
|
|||
self._override_encryption = False | |||
if password is not None and self.decrypt(password) == 0: | |||
raise PdfReadError("Wrong password") | |||
self.encryption: Optional[Encryption] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistently with my question in _merger, I would prefer if this was the private attribute _encryption
instead of the public attribute `encryption. Except, of course, if there is a reason why users would want to access it.
PyPDF2/_reader.py
Outdated
encryptEntry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object()) | ||
self.encryption = Encryption.read(encryptEntry, id1_entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encryptEntry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object()) | |
self.encryption = Encryption.read(encryptEntry, id1_entry) | |
encrypt_entry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object()) | |
self.encryption = Encryption.read(encrypt_entry, id1_entry) |
PyPDF2/_encryption.py
Outdated
self._user_keys: Dict = {} | ||
self._owner_keys: Dict = {} | ||
|
||
def verified(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess verified means that the file was decrypted? Would is_decrypted
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think this should rather be _verified
if the user does not need it.
I've tried to cross-check the decryption, but...
I guess we are head of our time 😄 |
@exiledkingcc Overall, it looks good to me. Good work 👏 👍 🎉 There are a couple of things I want to be changed before I release it. I could do them myself after merging this PR or you could do them now - whatever you prefer. Just let me know :-) I've also noticed that if either the user password or the owner password is set to the empty password, typical viewers will directly show the contents. For this reason I would add the following test PDF:
|
@MartinThoma all your review comments are great, i will do the update tonight. |
I just saw https://github.com/pdfminer/pdfminer.six/tree/master/samples and especially:
I think I'll run PyPDF2 over those examples some time today :-) |
updated. |
try: | ||
pwd = password.encode("latin-1") | ||
except Exception: # noqa | ||
pwd = password.encode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose latin-1 as the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest, i don't think too much about it, just copy it from previous code. 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, ok. Thanks for the honesty ❤️
@exiledkingcc Very nice work! Thank you 🤗 I'll release it today (just have to eat something now 😄 ) |
New Features (ENH): - Support R6 decrypting (#1015) - Add PdfReader.pdf_header (#1013) Performance Improvements (PI): - Remove ord_ calls (#1014) Bug Fixes (BUG): - Fix missing page for bookmark (#1016) Robustness (ROB): - Deal with invalid Destinations (#1028) Documentation (DOC): - get_form_text_fields does not extract dropdown data (#1029) - Adjust PdfWriter.add_uri docstring - Mention crypto extra_requires for installation (#1017) Developer Experience (DEV): - Use /n line endings everywhere (#1027) - Adjust string formatting to be able to use mutmut (#1020) - Update Bug report template Full Changelog: 2.3.1...2.4.0
New Features (ENH): - Add PageObject._get_fonts (#1083) - Add support for indexed color spaces / BitsPerComponent for decoding PNGs (#1067) Performance Improvements (PI): - Use iterative DFS in PdfWriter._sweep_indirect_references (#1072) Bug Fixes (BUG): - Let Page.scale also scale the crop-/trim-/bleed-/artbox (#1066) - Column default for CCITTFaxDecode (#1079) Robustness (ROB): - Guard against None-value in _get_outlines (#1060) Documentation (DOC): - Stamps and watermarks (#1082) - OCR vs PDF text extraction (#1081) - Python Version support - Formatting of CHANGELOG Developer Experience (DEV): - Cache downloaded files (#1070) - Speed-up for CI (#1069) Maintenance (MAINT): - Set page.rotate(angle: int) (#1092) - Issue #416 was fixed by #1015 (#1078) Testing (TST): - Image extraction (#1080) - Image extraction (#1077) Code Style (STY): - Apply black - Typo in Changelog Full Changelog: 2.4.2...2.4.3
Fixes #416