-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: 'NoneType' object is not iterable #1279
Comments
The file you've referenced is not readable. Please ensure you can open them and check informations with acrobat reader in case of failure to ensure it is not an issue with PyPDF2. |
I plan to use PyPDF2 to analyze a large number (> 100,000) of PDF files automatically. My reason for reporting this issue is that this issue is not captured by PyPDF2 (by raising a Can you recommend any automated method for checking a PDF for validity before loading it into PyPDF2 if this is a requirement? |
@DL6ER |
@pubpub-zz I already have such a My question is: Do you know what the goal is for PyPDF2? I'll definitely open another issue ticket in a moment as I found a small PDF (19 pages, < 1MB) which does not throw an exception but rather keeps PyPDF2 spinning forever at 100% CPU. |
I haven't completely made my mind up on this one. I'd be happy if people participated in the discussion here: #1210 |
I found two more PDF files triggering the exact same traceback. Sorry for the bad example above - these files are readable. |
@DL6ER I'm trying to sort out what your test program should be returning for all three pdfs. I'm using the pdfinfo command that comes with xpdf for Linux. I'm seeing that your first file has an xref table that's corrupted and is missing it's trailer. The second has a bunch of xref entries that I can't parse. The trailer is intact but contains nothing in the metadata that gets returned by the DocumentInformation() class. The third seems to be relatively intact short of one xref entry, and has Google listed as the creator, which is the only thing that should be returned by the DocumentInformation() class. If that's all correct, here's what I'm thinking should happen when you run your test code for each file: File#1: File#2: File#3: Is this what you had in mind when you wrote your test code? edit: |
fixes py-pdf#1279 / Status_v1_Reviewers-Guide.pdf
with Acrobat reader the only file I've been able to read is file#3. the PR has been adjusted to read pages and the metadata |
@ediamondscience The best approach in |
Added PdfReadError in cases where trailer is absent of can't be read. Closes #1279
The improvement found by @ediamondscience was just merged to |
See #1269 for further details.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform Linux-5.4.0-122-generic-x86_64-with-glibc2.29 $ python -c "import PyPDF2;print(PyPDF2.__version__)" 2.10.3
Code + PDF
This is a minimal, complete example that shows the issue:
PDF used above: pca_var.pdf
Traceback
This is the complete Traceback I see:
The text was updated successfully, but these errors were encountered: