Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OME-Zarr and HCS writers #16

Merged
merged 87 commits into from
Jan 18, 2023
Merged

OME-Zarr and HCS writers #16

merged 87 commits into from
Jan 18, 2023

Conversation

ziw-liu
Copy link
Collaborator

@ziw-liu ziw-liu commented Dec 13, 2022

This PR aims to update the writer so that the output datasets are compliant with the latest OME-NGFF specification. Fixes #3.

Secondary goals include:

@talonchandler
Copy link
Contributor

Thanks for the fast fix! This is now working as I expect:

$ python hcs_ome_zarr_writer.py
$ python
>> from iohub.zarrfile import HCSReader
>> reader = HCSReader('./hcs.zarr')
>> reader.get_array(0)

but the ome.zarr example is still returning an empty array with:

>>> from iohub.zarrfile import OMEZarrReader
>>> reader = OMEZarrReader('./ome.zarr')
>>> reader.get_array(0)

@talonchandler
Copy link
Contributor

Minor: when I run napari --plugin napari-ome-zarr hcs.zarr/ my console is filled with errors like
21:41:44 ERROR Failed to load H/8/0/0
and their tracebacks. The reader is maybe expecting a regular grid of wells? Not a big deal.

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 7, 2023

The reader is maybe expecting a regular grid of wells? Not a big deal.

Yes. This is a pitfall of ome_zarr.reader.Reader. It tries to load everything into a dask array. Another implication (or complication) from this behavior is that it does not support varying FOV sizes.

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 7, 2023

but the ome.zarr example is still returning an empty array with:

This is (some what) intended. ReaderBase.get_array() takes position indices, which does not make sense for OMEZarrReader since there cannot be multiple positions according to the spec. Another problem with get_array is that it ignored the possibility of pyramids. Although in our group we have not been using multi-scale much, it is an important feature for viewing large stitched images in tissue/organism imaging, like light sheet imaging of zebra fish (see #9 (comment)).
The reader should offer some convenience to get arrays, but that can be a separate PR. For now reader.root['0'][:] will likely get the equivalent.

@talonchandler
Copy link
Contributor

Okay great...reader.root['0'][:] returns the ndarray that I was expecting.

Your current OMEZarrReader inherits from ReaderBase, but leaves many of the ReaderBase methods/attributes unimplemented. Can you help me understand your current plan for:

  • get_array? Deprecate/remove it?
  • .shape .height, etc.? To be implemented? Or is there a different blocker?

I'm asking about these specifically because they are very common parts of my workflows. My day-to-day usage of the readers involves (in order of frequency):

  • getting the shape of the data
  • extracting a single position from the zarr store into a numpy array for viewing + prototyping + numpy operations
  • applying an operation to each position (in a for loop, or in parallel)

I'm completely fine with deprecating or changing get_array (or removing it from the base class so that we can leave the existing readers mostly unchanged), but I do want to prioritize how easy it is for a user to get a numpy array from the reader---IMO .root['0'][:] is not a very intuitive way to get an array.

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 7, 2023

Your current OMEZarrReader inherits from ReaderBase, but leaves many of the ReaderBase methods/attributes unimplemented. Can you help me understand your current plan for:

* `get_array`? Deprecate/remove it?

* `.shape` `.height`, etc.? To be implemented? Or is there a different blocker?

For OMEZarrReader these are trivial. .get_array or equivalent needs a different signature from ReaderBase.get_array because it's inherently single-position.

For HCSReader, however, it is a much more complicated story. As is the problem with the ome-zarr-py reader, convenience contradicts with support for non-regular datasets, e.g. data from a multi-arm microscope with different camera sensors, spatial sampling, and coordinate systems, where shape/height are not a single global value. Things would be easy if we decide to not support these at all, and can be quickly implemented in another PR focused on updating OME-NGFF readers.

@talonchandler
Copy link
Contributor

Okay so it seems that the ReaderBase.get_array (and .get_zarr and .get_image) function signatures (int -> ndarray or similar) aren't compatible with the HCSReader's requirements. So should we consider changing the base class?

Regarding .shape, I agree that we want to support irregular datasets. I can still imagine attaching a meaning to .shape: something like "all of the various dimensions within my dataset, even if it's irregular and/or expensive to calculate". This can be a separate issue/PR, though.

@talonchandler
Copy link
Contributor

@ziw-liu do you have an anticipated timeline or todo list before this merges?

If you want me to give this a detailed pass through at the line-by-line level I can. Or we're still a few weeks out then I can wait.

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 9, 2023

do you have an anticipated timeline or todo list before this merges?

My intention was to keep this PR minimally-invasive (already quite a bulky one with ~2k lines changed though) and get to the point where the writer feature can be a non-blocking dependency for recOrder and microDL (with the tolerance of future breaking changes). It is up to debate with the downstream devs (including us) whether this goal has been achieved, thus I invite more pre-alpha user testing. Once most stakeholders agree that what this PR provides (along with several smaller ones, e.g. on updating readers) will provide the minimal feature set, I think we can merge this.

On my end, I would like to see:

  • more tests
  • (optional) better write method
  • (optional) a method/function to build HCS datasets from 'raw'

@mattersoflight
Copy link
Collaborator

For HCSReader, however, it is a much more complicated story. As is the problem with the ome-zarr-py reader, convenience contradicts with support for non-regular datasets, e.g. data from a multi-arm microscope with different camera sensors, spatial sampling, and coordinate systems, where shape/height are not a single global value. Things would be easy if we decide to not support these at all, and can be quickly implemented in another PR focused on updating OME-NGFF readers.

iohub need not support storage of datasets with different coordinates in the same zarr store. Our correlative imaging and analysis pipelines will be reading and writing data in different modalities in independent zarr stores before they are registered and merged.

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 17, 2023

iohub need not support storage of datasets with different coordinates in the same zarr store.

Thanks for clarifying, also linking the discussion in #18.

Copy link
Collaborator

@mattersoflight mattersoflight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example runs well on my M1 and I can view data in napari. My main input is on making the API intuitive. I will repeat this for single position zarr store (aka OMEZarr).

examples/hcs_ome_zarr_writer.py Show resolved Hide resolved
examples/hcs_ome_zarr_writer.py Show resolved Hide resolved
examples/hcs_ome_zarr_writer.py Outdated Show resolved Hide resolved
# Write to the positions

for pos in positions:
p = writer.require_position(*pos)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writer.set_position would be more intuitive than writer.require_position.

examples/hcs_ome_zarr_writer.py Show resolved Hide resolved
examples/hcs_ome_zarr_writer.py Show resolved Hide resolved
@ziw-liu ziw-liu marked this pull request as ready for review January 18, 2023 06:01
@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Jan 18, 2023

Given #19, I think we can merge this with minimal code review (just to make sure that nothing looks crazy), and proceed with implementing the proposed API.

@mattersoflight mattersoflight self-requested a review January 18, 2023 16:22
Copy link
Collaborator

@mattersoflight mattersoflight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of feedback from this PR and the offline discussion is summarized in #19.

@ziw-liu ziw-liu mentioned this pull request Jan 18, 2023
@ziw-liu ziw-liu merged commit e3e8237 into main Jan 18, 2023
@ziw-liu ziw-liu deleted the hcs-dataset-v4 branch January 18, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)
Projects
None yet
3 participants