-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generic/_utils.py: function create_string_object not working with the bytearray type #2434
Comments
Thanks for the report. While this patch might work, this will break |
Hello Stefan, (I'm glad to meet another Stefan) Thank you for your responsiveness. The error seems to happen with objects of type TextStringObject (from mypy is currently accepting the bytearray to bytes promotion but there is an expirement to remove it. Sources:
If the function diff --git a/pypdf/generic/_base.py b/pypdf/generic/_base.py
index 5a27572..813b1df 100644
--- a/pypdf/generic/_base.py
+++ b/pypdf/generic/_base.py
@@ -650,4 +650,4 @@ def encode_pdfdocencoding(unicode_string: str) -> bytes:
raise UnicodeEncodeError(
"pdfdocencoding", c, -1, -1, "does not exist in translation table"
)
- return retval
+ return bytes(retval) Detailed explanation:
|
I am still not completely sure why Do you want to submit a corresponding PR? |
Because of the method get_original_bytes(self) -> bytes of the class TextStringObject def get_original_bytes(self) -> bytes:
# We're a text string object, but the library is trying to get our raw
# bytes. This can happen if we auto-detected this string as text, but
# we were wrong. It's pretty common. Return the original bytes that
# would have been used to create this object, based upon the autodetect
# method.
if self.autodetect_utf16:
return codecs.BOM_UTF16_BE + self.encode("utf-16be") # <-- returns bytes
elif self.autodetect_pdfdocencoding:
return encode_pdfdocencoding(self) # <-- returns bytearray
else:
raise Exception("no information about original bytes") I opened the PR #2440 which cast the return of encode_pdfdocencoding into bytes. |
Hello pypdf team,
While trying to to get the fields of my PDF with the function
PdfReader.get_fields()
, my code received an exception from the functioncreate_string_object
(in pypdf/generic/_utils.py, line 113) because it received a bytearray instead of a str or bytes.By looking at the traceback, the error occurs when the function
def decrypt_object(self, obj: PdfObject) -> PdfObject
detects that the object to decrypt is either of type ByteStringObject or TextStringObject, before callingcreate_string_object
.The documentation about the bytearray type states:
source: https://docs.python.org/3/library/stdtypes.html#bytearray
So it seems like the function
create_string_object
could accept bytearray objects and could treat them as bytes instead of raising an exception.After applying this fix, I was able to read the fields of my PDF.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
I can't provide my PDF file because it contains personal information.
Traceback
This is the complete traceback I see:
The text was updated successfully, but these errors were encountered: