Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend image metadata #18951

Merged
merged 35 commits into from
Nov 25, 2024
Merged

Extend image metadata #18951

merged 35 commits into from
Nov 25, 2024

Conversation

kostrykin
Copy link
Contributor

@kostrykin kostrykin commented Oct 8, 2024

This PR adds a series of basic metadata elements for image data, including:

This is useful to define validators for input data when working with images. Some examples of when this will be useful:

  • Require that an image is a binary image by validating that num_unique_values is 1 or 2.
  • Validate dtype: Some tools might not support float but only int image data (or vice versa).
  • Validate that channels is 0 or 1: Restrict input data to single-channel images.
  • Validate axes, depth, channels, frames: Require that an image has one or more z-slices / channels / time steps.

TIFF files are read using the tifffile library, other image types are tried to be read using Pillow. The new metadata is defined as optional, because Pillow might not be installed, or it might not be possible to read an image using Pillow (e.g., due to an image format that Pillow does not support).

For multi-page TIFF files, the metadata is determined for each page individually, and then joined into a ,-separated string provided as a JSON-encoded list of items (with the order corresponding to the order of the pages in the series).

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@kostrykin

This comment was marked as resolved.

@kostrykin kostrykin marked this pull request as draft October 8, 2024 20:01
@bernt-matthias
Copy link
Contributor

bernt-matthias commented Oct 8, 2024

The problem with black will be solved here #18955

@kostrykin kostrykin marked this pull request as ready for review October 8, 2024 21:54
@kostrykin kostrykin marked this pull request as draft October 9, 2024 05:49
@mvdbeek mvdbeek self-assigned this Nov 12, 2024
@kostrykin

This comment was marked as outdated.

@bernt-matthias
Copy link
Contributor

Just had the same problem and my solution for this was 887b1aa

@kostrykin
Copy link
Contributor Author

Thanks @bernt-matthias this solved it.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 15, 2024

The integration test failures look real, is that because of a missing library ?

FAILED test/integration/test_datatype_upload.py::test_upload_datatype_auto[im_empty.tif] - AssertionError: assert 'binary' == 'tiff'
  - tiff
  + binary
FAILED test/integration/test_datatype_upload.py::test_upload_datatype_auto[im4_float.tif] - AssertionError: assert 'binary' == 'tiff'
  - tiff
  + binary
= 2 failed, 246 passed, 18 skipped, 2 xpassed, 1063 warnings in 4159.23s (1:09:19) =

@mvdbeek
Copy link
Member

mvdbeek commented Nov 15, 2024

Also know that the files placed in lib/galaxy/datatypes/test must match the datatype, so if you're placing invalid files there you have to change the extension.

@jdavcs jdavcs modified the milestones: 24.2, 25.0 Nov 20, 2024
@kostrykin

This comment was marked as outdated.

@kostrykin
Copy link
Contributor Author

kostrykin commented Nov 21, 2024

I think I found the issue. Image files currently use Pillow for sniffing the image type:

However, reading these two images with Pillow fails.


Edit: This is now fixed in 8432024. TIFFs are now sniffed using tifffile, which is more reliable than Pillow. In addition, im_corrupted.tif is removed in 9a8d944, because it isn't recognized as a TIFF file by tifffile anyways.

@mvdbeek mvdbeek merged commit 71f87d8 into galaxyproject:dev Nov 25, 2024
52 of 54 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 25, 2024

Thanks a lot @kostrykin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants