Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) #2806

BertrandBordage · 2024-08-21T23:55:34Z

As I was processing client PDFs with pypdf, one of them triggered a cryptic error (traceback below).
Ideally, pypdf should raise a PdfReadError (or another subclass of PyPdfError) if that file is really impossible to parse.

Environment

$ python -m platform
Linux-5.15.0-118-generic-x86_64-with-glibc2.35

$ python --version
Python 3.11.9

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=9.5.0

Code

This is a minimal code example that shows the issue:

from pypdf import PdfReader
reader = PdfReader('client file (might be broken).pdf')
list(reader.pages)

I cannot share the PDF as it might contain sensitive client data.

Two messages/warnings are displayed before the traceback, though:

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.11/site-packages/pypdf/_page.py", line 2227, in __len__
    return self.length_function()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/pypdf/_doc_common.py", line 353, in get_num_pages
    self._flatten()
  File "/lib/python3.11/site-packages/pypdf/_doc_common.py", line 1101, in _flatten
    catalog = self.root_object
              ^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/pypdf/_reader.py", line 191, in root_object
    return cast(DictionaryObject, self.trailer[TK.ROOT].get_object())
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get_object'

Additional info

The PDF might be corrupted, as I am unable to open it with Evince, which shows this error: Failed to read the document catalog.
xpdf is also showing various errors when reading the file:

Syntax Error: Couldn't find trailer dictionary
Syntax Error: Invalid XRef entry 493
Internal Error: xref num 493 not found but needed, try to reconstruct<0a>
Syntax Error: Invalid XRef entry 493
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Catalog object is wrong type (null)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Invalid XRef entry 493
Internal Error: xref num 493 not found but needed, try to reconstruct<0a>
Syntax Error: Invalid XRef entry 493
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Catalog object is wrong type (null)
Syntax Error: Couldn't read page catalog

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2024-08-22T03:40:02Z

Thanks for the report. Judging from the stacktrace and the third-party logs, this PDF file just appears to be broken, as apparently the (essential) root object cannot be found.

Feel free to open a PR to convert this into an appropriate exception.

stefan6419846 changed the title ~~AttributeError: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689)~~ Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) Aug 22, 2024

stefan6419846 added PdfReader The PdfReader component is affected is-uncaught-exception Use this label only for issues caused by broken PDF documents that cannot be recovered. labels Aug 22, 2024

BertrandBordage mentioned this issue Aug 22, 2024

ROB: Raise PdfReadError when missing /Root in trailer. #2808

Merged

stefan6419846 closed this as completed in #2808 Aug 23, 2024

stefan6419846 closed this as completed in 9f08cd0 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) #2806

Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) #2806

BertrandBordage commented Aug 21, 2024

stefan6419846 commented Aug 22, 2024

Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) #2806

Missing root object raising: 'NoneType' object has no attribute 'get_object' (different from #1295 & #1689) #2806

Comments

BertrandBordage commented Aug 21, 2024

Environment

Code

Traceback

Additional info

stefan6419846 commented Aug 22, 2024