Manipulated inline images can force PyPDF2 into an infinite loop #329

sekrause · 2017-02-17T11:55:45Z

When you try to get the content stream of this attached PDF, PyPDF2 will end up in an infinite loop. So this is probably a security issue because it might be possible to denial-of-service applications using PyPDF2.

The reason is that the last while-loop in ContentStream._readInlineImage only terminates when it finds the EI token, but never actually checks if the stream has already ended. So it's as simple as adding a (broken) inline image that doesn't have an EI token at all, like the attached PDF.

You can see the infinite loop by running this test script with the attached PDF:

import sys

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.pdf import ContentStream

with open(sys.argv[1], 'rb') as f:
    pdf = PdfFileReader(f, strict=False)
    for page in pdf.pages:
        contentstream = ContentStream(page.getContents(), pdf)
        for operands, command in contentstream.operations:
            if command == b'INLINE IMAGE':
                data = operands['data']
                print(len(data))

I will soon prepare a pull request that fixes this issue.

The text was updated successfully, but these errors were encountered:

Closes #329 - potential infinite loop (SEC) Closes #330 - performance issue of ContentStream._readInlineImage (PERF)

sekrause mentioned this issue Feb 17, 2017

Improved performance and security for ContentStream_readInlineImage #331

Closed

sekrause mentioned this issue Jul 28, 2017

Endless Loop When Processing Certain Large PDF with PdfFileWriter #358

Closed

sekrause mentioned this issue Mar 3, 2018

Rebooting PyPDF2 Maintenance #385

Closed

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF nf-performance Non-functional change: Performance nf-security Non-functional change: Security labels Apr 8, 2022

sekrause mentioned this issue Apr 12, 2022

Improved performance and security for ContentStream_readInlineImage. #740

Merged

MartinThoma closed this as completed in #740 Apr 15, 2022

MartinThoma pushed a commit that referenced this issue Apr 15, 2022

SEC/PERF: ContentStream_readInlineImage (#740)

d71fb3e

Closes #329 - potential infinite loop (SEC) Closes #330 - performance issue of ContentStream._readInlineImage (PERF)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manipulated inline images can force PyPDF2 into an infinite loop #329

Manipulated inline images can force PyPDF2 into an infinite loop #329

sekrause commented Feb 17, 2017

Manipulated inline images can force PyPDF2 into an infinite loop #329

Manipulated inline images can force PyPDF2 into an infinite loop #329

Comments

sekrause commented Feb 17, 2017