'not enough image data' exception from PIL #2343

brianpow · 2023-12-15T00:37:49Z

I am trying to extract images from pdf files, however occasionally it gives 'not enough image data' exception from PIL when handling certain pdf. The files look correct in Atril Document Viewer and works if using pdfimages from poppler-utils

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.5.0-kali3-amd64-x86_64-with-glibc2.37

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '38.0.4'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
import sys

for filename in sys.argv[1:]:
    reader = PdfReader(filename)
    for i, page in enumerate(reader.pages):
        for j, image in enumerate(page.images):
            print("Writing %d-%d: %s (%d)..." % (i, j, image.name, len(image.data)))            
            with open(image.name, "wb") as fp:
                fp.write(image.data)

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

test2_P038-038.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/user/pypdf/pypdf_test.py", line 7, in <module>
    for j, image in enumerate(page.images):
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2727, in __iter__
    yield self[i]
          ~~~~^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2723, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 557, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/filters.py", line 785, in _xobj_to_image
    img, image_format, extension, _ = _handle_flate(
                                      ^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_xobj_image_helpers.py", line 172, in _handle_flate
    img = Image.frombytes(mode, size, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2952, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 805, in frombytes
    raise ValueError(msg)
ValueError: not enough image data

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2024-04-07T21:09:58Z

the issue is on the first image:

pubpub-zz · 2024-04-08T17:33:30Z

also some tests with
https://github.com/py-pdf/pypdf/files/13946477/panda.pdf
image r.pages[8].images[9] (a small black image):

closes py-pdf#2343: 1st case : image with images in 1 byte encoding with Separation colorspace 2nd case: similar + \n to be ignored at the end of the image data

Closes #2343: 1st case : image with images in 1 byte encoding with Separation color space 2nd case: similar + \n to be ignored at the end of the image data

stefan6419846 mentioned this issue Jan 16, 2024

Bug when getting images from a pdf #2124

Closed

pubpub-zz mentioned this issue Apr 8, 2024

ROB: Cope with some image extraction issues #2591

Merged

stefan6419846 closed this as completed in #2591 Apr 10, 2024

stefan6419846 pushed a commit that referenced this issue Apr 10, 2024

ROB: Cope with some image extraction issues (#2591)

ced67e1

Closes #2343: 1st case : image with images in 1 byte encoding with Separation color space 2nd case: similar + \n to be ignored at the end of the image data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'not enough image data' exception from PIL #2343

'not enough image data' exception from PIL #2343

brianpow commented Dec 15, 2023

pubpub-zz commented Apr 7, 2024

pubpub-zz commented Apr 8, 2024

'not enough image data' exception from PIL #2343

'not enough image data' exception from PIL #2343

Comments

brianpow commented Dec 15, 2023

Environment

Code + PDF

Traceback

pubpub-zz commented Apr 7, 2024

pubpub-zz commented Apr 8, 2024