Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'PIL.Image' has no attribute 'ExifTags' #6881

Closed
albertvillanova opened this issue May 8, 2024 · 3 comments · Fixed by #6883
Closed

AttributeError: module 'PIL.Image' has no attribute 'ExifTags' #6881

albertvillanova opened this issue May 8, 2024 · 3 comments · Fixed by #6883
Assignees
Labels
bug Something isn't working

Comments

@albertvillanova
Copy link
Member

When trying to load an image dataset in an old Python environment (with Pillow-8.4.0), an error is raised:

AttributeError: module 'PIL.Image' has no attribute 'ExifTags'

The error traceback:

~/huggingface/datasets/src/datasets/iterable_dataset.py in __iter__(self)
   1391                 # `IterableDataset` automatically fills missing columns with None.
   1392                 # This is done with `_apply_feature_types_on_example`.
-> 1393                 example = _apply_feature_types_on_example(
   1394                     example, self.features, token_per_repo_id=self._token_per_repo_id
   1395                 )

~/huggingface/datasets/src/datasets/iterable_dataset.py in _apply_feature_types_on_example(example, features, token_per_repo_id)
   1080     encoded_example = features.encode_example(example)
   1081     # Decode example for Audio feature, e.g.
-> 1082     decoded_example = features.decode_example(encoded_example, token_per_repo_id=token_per_repo_id)
   1083     return decoded_example
   1084 

~/huggingface/datasets/src/datasets/features/features.py in decode_example(self, example, token_per_repo_id)
   1974 
-> 1975         return {
   1976             column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
   1977             if self._column_requires_decoding[column_name]

~/huggingface/datasets/src/datasets/features/features.py in <dictcomp>(.0)
   1974 
   1975         return {
-> 1976             column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
   1977             if self._column_requires_decoding[column_name]
   1978             else value

~/huggingface/datasets/src/datasets/features/features.py in decode_nested_example(schema, obj, token_per_repo_id)
   1339         # we pass the token to read and decode files from private repositories in streaming mode
   1340         if obj is not None and schema.decode:
-> 1341             return schema.decode_example(obj, token_per_repo_id=token_per_repo_id)
   1342     return obj
   1343 

~/huggingface/datasets/src/datasets/features/image.py in decode_example(self, value, token_per_repo_id)
    187             image = PIL.Image.open(BytesIO(bytes_))
    188         image.load()  # to avoid "Too many open files" errors
--> 189         if image.getexif().get(PIL.Image.ExifTags.Base.Orientation) is not None:
    190             image = PIL.ImageOps.exif_transpose(image)
    191         if self.mode and self.mode != image.mode:

~/huggingface/datasets/venv/lib/python3.9/site-packages/PIL/Image.py in __getattr__(name)
     75                 )
     76                 return categories[name]
---> 77         raise AttributeError(f"module '{__name__}' has no attribute '{name}'")
     78 
     79 

AttributeError: module 'PIL.Image' has no attribute 'ExifTags'

Environment info

Since datasets 2.19.0

@rwightman
Copy link

rwightman commented Jul 18, 2024

@albertvillanova @lhoestq just ran into it and requiring newer pillow isn't a solution as it breaks Pillow-SIMD which is behind Pillow quite a few versions but necessary for training with reasonable throughput.

A couple things here...

  1. This can be done with a method that isn't an issue for any somewhat recent Pillow
    image = ImageOps.exif_transpose(image)

  2. I'd rather this not be done for me automatically. Sometimes exif data is correct, sometimes it's not. Sometimes I might want to correct the orientation, sometimes I might not.

In any case if I've preprocessed the images properly myself I don't want to incur overhead, possible further fp seeks, parsing, to load the exif that's not loaded and parsed when you just open and decode the image.

@albertvillanova
Copy link
Member Author

Hi @rwightman, thanks for your feedback.

First, as a side note comment, please note that you are depending on Pillow-SIMD and that library seems no longer maintained:

In relation with your suggestions for the datasets library, the changes were introduced by this PR:

I agree maybe we should have given the option whether to perform this operation or not.

@rwightman
Copy link

@albertvillanova

Huh, thought I'd just installed the current datasets when I ran into this, maybe it was behind...

I'm aware the support for SIMD is a problem, but it's up to 8x faster than non SIMD Pillow and really necessary in many training situations or you have lots of idle GPUs. The current situation is unfortunate but most changes since 9.0 aren't all that important for 'decoding jpegs and resizing'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants