OME-Zarr and HCS writers #16

ziw-liu · 2022-12-13T18:21:50Z

This PR aims to update the writer so that the output datasets are compliant with the latest OME-NGFF specification. Fixes #3.

Secondary goals include:

Support microDL data IO #10: provide writer initialization methods that reads in an existing data store
Update tests (related: Test workflow is outdated #5)

talonchandler · 2023-01-07T05:39:19Z

Thanks for the fast fix! This is now working as I expect:

$ python hcs_ome_zarr_writer.py
$ python
>> from iohub.zarrfile import HCSReader
>> reader = HCSReader('./hcs.zarr')
>> reader.get_array(0)

but the ome.zarr example is still returning an empty array with:

>>> from iohub.zarrfile import OMEZarrReader
>>> reader = OMEZarrReader('./ome.zarr')
>>> reader.get_array(0)

talonchandler · 2023-01-07T05:44:02Z

Minor: when I run napari --plugin napari-ome-zarr hcs.zarr/ my console is filled with errors like
21:41:44 ERROR Failed to load H/8/0/0
and their tracebacks. The reader is maybe expecting a regular grid of wells? Not a big deal.

ziw-liu · 2023-01-07T06:25:08Z

The reader is maybe expecting a regular grid of wells? Not a big deal.

Yes. This is a pitfall of ome_zarr.reader.Reader. It tries to load everything into a dask array. Another implication (or complication) from this behavior is that it does not support varying FOV sizes.

ziw-liu · 2023-01-07T06:45:47Z

but the ome.zarr example is still returning an empty array with:

This is (some what) intended. ReaderBase.get_array() takes position indices, which does not make sense for OMEZarrReader since there cannot be multiple positions according to the spec. Another problem with get_array is that it ignored the possibility of pyramids. Although in our group we have not been using multi-scale much, it is an important feature for viewing large stitched images in tissue/organism imaging, like light sheet imaging of zebra fish (see #9 (comment)).
The reader should offer some convenience to get arrays, but that can be a separate PR. For now reader.root['0'][:] will likely get the equivalent.

talonchandler · 2023-01-07T18:00:18Z

Okay great...reader.root['0'][:] returns the ndarray that I was expecting.

Your current OMEZarrReader inherits from ReaderBase, but leaves many of the ReaderBase methods/attributes unimplemented. Can you help me understand your current plan for:

get_array? Deprecate/remove it?
.shape .height, etc.? To be implemented? Or is there a different blocker?

I'm asking about these specifically because they are very common parts of my workflows. My day-to-day usage of the readers involves (in order of frequency):

getting the shape of the data
extracting a single position from the zarr store into a numpy array for viewing + prototyping + numpy operations
applying an operation to each position (in a for loop, or in parallel)

I'm completely fine with deprecating or changing get_array (or removing it from the base class so that we can leave the existing readers mostly unchanged), but I do want to prioritize how easy it is for a user to get a numpy array from the reader---IMO .root['0'][:] is not a very intuitive way to get an array.

ziw-liu · 2023-01-07T22:50:20Z

Your current OMEZarrReader inherits from ReaderBase, but leaves many of the ReaderBase methods/attributes unimplemented. Can you help me understand your current plan for:
* `get_array`? Deprecate/remove it?

* `.shape` `.height`, etc.? To be implemented? Or is there a different blocker?

For OMEZarrReader these are trivial. .get_array or equivalent needs a different signature from ReaderBase.get_array because it's inherently single-position.

For HCSReader, however, it is a much more complicated story. As is the problem with the ome-zarr-py reader, convenience contradicts with support for non-regular datasets, e.g. data from a multi-arm microscope with different camera sensors, spatial sampling, and coordinate systems, where shape/height are not a single global value. Things would be easy if we decide to not support these at all, and can be quickly implemented in another PR focused on updating OME-NGFF readers.

talonchandler · 2023-01-07T23:42:21Z

Okay so it seems that the ReaderBase.get_array (and .get_zarr and .get_image) function signatures (int -> ndarray or similar) aren't compatible with the HCSReader's requirements. So should we consider changing the base class?

Regarding .shape, I agree that we want to support irregular datasets. I can still imagine attaching a meaning to .shape: something like "all of the various dimensions within my dataset, even if it's irregular and/or expensive to calculate". This can be a separate issue/PR, though.

talonchandler · 2023-01-07T23:44:09Z

@ziw-liu do you have an anticipated timeline or todo list before this merges?

If you want me to give this a detailed pass through at the line-by-line level I can. Or we're still a few weeks out then I can wait.

ziw-liu · 2023-01-09T17:17:10Z

do you have an anticipated timeline or todo list before this merges?

My intention was to keep this PR minimally-invasive (already quite a bulky one with ~2k lines changed though) and get to the point where the writer feature can be a non-blocking dependency for recOrder and microDL (with the tolerance of future breaking changes). It is up to debate with the downstream devs (including us) whether this goal has been achieved, thus I invite more pre-alpha user testing. Once most stakeholders agree that what this PR provides (along with several smaller ones, e.g. on updating readers) will provide the minimal feature set, I think we can merge this.

On my end, I would like to see:

more tests
(optional) better write method
(optional) a method/function to build HCS datasets from 'raw'

mattersoflight · 2023-01-17T21:41:19Z

For HCSReader, however, it is a much more complicated story. As is the problem with the ome-zarr-py reader, convenience contradicts with support for non-regular datasets, e.g. data from a multi-arm microscope with different camera sensors, spatial sampling, and coordinate systems, where shape/height are not a single global value. Things would be easy if we decide to not support these at all, and can be quickly implemented in another PR focused on updating OME-NGFF readers.

iohub need not support storage of datasets with different coordinates in the same zarr store. Our correlative imaging and analysis pipelines will be reading and writing data in different modalities in independent zarr stores before they are registered and merged.

ziw-liu · 2023-01-17T21:56:11Z

iohub need not support storage of datasets with different coordinates in the same zarr store.

Thanks for clarifying, also linking the discussion in #18.

mattersoflight

This example runs well on my M1 and I can view data in napari. My main input is on making the API intuitive. I will repeat this for single position zarr store (aka OMEZarr).

examples/hcs_ome_zarr_writer.py

mattersoflight · 2023-01-17T21:52:32Z

examples/hcs_ome_zarr_writer.py

+# Write to the positions
+
+for pos in positions:
+    p = writer.require_position(*pos)


writer.set_position would be more intuitive than writer.require_position.

examples/hcs_ome_zarr_writer.py

ziw-liu · 2023-01-18T06:57:51Z

Given #19, I think we can merge this with minimal code review (just to make sure that nothing looks crazy), and proceed with implementing the proposed API.

mattersoflight

The majority of feedback from this PR and the offline discussion is summarized in #19.

ziw-liu added 30 commits December 2, 2022 17:16

extract generic ome-zarr writer

7ac41de

elevate the get relative keys method

3cfd8bf

update dependencies

dec686d

axis model

0805c6d

multiscales model

f70eb42

label model

78035dc

hcs plate model

846d6ae

fix color fields

aafd128

well group model

089f2c7

edit module doc

548214d

supply array name option

bf6eef7

base write api

5de463b

images model

eb96ee0

document image model spec link

5614d78

use native model instead of dataclasses

cab3d3e

use field name in signature

9557c09

fix dataframe

124520c

fix rdefs model init

e5da56b

fix imagesmeta model init

855d3d2

update position attributes

a93238f

generate omero metadata separately

b999353

fix method calls

2b9a2ff

pass coordinate transformations

226c00d

fix iteration on optional parameter

c1d7ce7

use zarr's json encoder

b20c3e1

elevate position metadata generation

6fb52b8

override writing methods in child writer

7477961

fix overwrite param passing

01f6a01

remove scaling parameters

3c97b0e

track and serialize well data

3032507

ziw-liu added 2 commits January 6, 2023 22:49

ensure that OMEZarrReader is read-only

5ebaeca

change arg name to match super

174abc5

download hcs reference dataset

19bbdfe

ziw-liu mentioned this pull request Jan 9, 2023

Enforcing uniform data shape in a single OME-Zarr HCS dataset #18

Closed

ziw-liu mentioned this pull request Jan 17, 2023

added current implementation of HCSZarrModifier #13

Closed

mattersoflight reviewed Jan 17, 2023

View reviewed changes

update example of appending a new channel

fb19f33

ziw-liu marked this pull request as ready for review January 18, 2023 06:01

ziw-liu requested review from JoOkuma and AhmetCanSolak January 18, 2023 06:56

mattersoflight self-requested a review January 18, 2023 16:22

mattersoflight approved these changes Jan 18, 2023

View reviewed changes

ziw-liu mentioned this pull request Jan 18, 2023

Supported Python versions #17

Closed

test open reference datset

d4dbfc4

ziw-liu merged commit e3e8237 into main Jan 18, 2023

ziw-liu deleted the hcs-dataset-v4 branch January 18, 2023 19:54

ziw-liu mentioned this pull request Jan 23, 2023

Hypothesis should be a dev-only dependency #29

Closed

ziw-liu mentioned this pull request Feb 15, 2023

Rename WaveorderReader to ImageReader #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OME-Zarr and HCS writers #16

OME-Zarr and HCS writers #16

ziw-liu commented Dec 13, 2022

talonchandler commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

talonchandler commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 9, 2023

mattersoflight commented Jan 17, 2023

ziw-liu commented Jan 17, 2023

mattersoflight left a comment

mattersoflight Jan 17, 2023

ziw-liu commented Jan 18, 2023

mattersoflight left a comment

OME-Zarr and HCS writers #16

OME-Zarr and HCS writers #16

Conversation

ziw-liu commented Dec 13, 2022

talonchandler commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 7, 2023

talonchandler commented Jan 7, 2023

talonchandler commented Jan 7, 2023

ziw-liu commented Jan 9, 2023

mattersoflight commented Jan 17, 2023

ziw-liu commented Jan 17, 2023

mattersoflight left a comment

Choose a reason for hiding this comment

mattersoflight Jan 17, 2023

Choose a reason for hiding this comment

ziw-liu commented Jan 18, 2023

mattersoflight left a comment

Choose a reason for hiding this comment