Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/statistics errors due to different dimensions of mosaic mask and cropped array #122

Closed
hrodmn opened this issue Sep 9, 2023 · 5 comments

Comments

@hrodmn
Copy link
Collaborator

hrodmn commented Sep 9, 2023

I am running titiler-pgstac in an eoAPI deployment and have been working on an application that uses the /statistics endpoint to handle some zonal statistics. It is working really well 90% of the time, but I am hitting errors on polygons that cross STAC item boundaries (polygons that do not cross STAC item boundaries seem to be working fine).

Here is an example traceback for one of the errors:

...
  File "/tmp/pip-target-fgo8_kmi/lib/python/titiler/pgstac/factory.py", line 947, in geojson_statistics
  File "/tmp/pip-target-fgo8_kmi/lib/python/titiler/pgstac/mosaic.py", line 420, in feature
  File "/tmp/pip-target-fgo8_kmi/lib/python/rio_tiler/mosaic/reader.py", line 98, in mosaic_reader
  File "/tmp/pip-target-fgo8_kmi/lib/python/rio_tiler/mosaic/methods/defaults.py", line 23, in feed
ValueError: operands could not be broadcast together with shapes (1,31,28) (1,30,27)

This happens when I use the default value for the maxsize parameter, but if I dial maxsize down to a smaller number (e.g. 29 in this case), the request succeeds.

The ultimate error is occurring here:

https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/mosaic/methods/defaults.py#L23

I haven't been able to reproduce the runtime environment yet, but it seems as if the dimensions of img and FirstMethod.mosaic.mask are not the expected size.

Possibly this is not a bug in titiler-pgstac and is related to STAC item metadata instead?

Any advice on how to debug would be greatly appreciated!

@hrodmn
Copy link
Collaborator Author

hrodmn commented Sep 11, 2023

I have one more clue. The incongruent shapes (shapes (1,31,28) (1,30,27)) represent the shapes returned by successive reads to the STAC items using rio_tiler.io.stac.STACReader.

from rio_tiler.io.stac import STACReader


item_ids = ["1459200_2304000", "1536000_2304000"]

for item_id in item_ids:
    with STACReader(f"{stac_endpoint}/collections/{collection}/items/{item_id}") as stac:
        img = stac.feature(
            shape=geojson["features"][0],
            assets=["public"],
        )
        print(f"item {item_id}: {img.mask.shape}")
item 1459200_2304000: (30, 27)
item 1536000_2304000: (31, 28)

When I dial max_size down to the next integer below the smallest value from the larger dimension in the arrays, they come out the same size:

for item_id in item_ids:
    with STACReader(f"{stac_endpoint}/collections/{collection}/items/{item_id}") as stac:
        img = stac.feature(
            shape=geojson["features"][0],
            assets=["public"],
            max_size=29,
        )
        print(f"item {item_id}: {img.mask.shape}")
item 1459200_2304000: (29, 27)
item 1536000_2304000: (29, 27)

Shouldn't the img.transform (and subsequent arrays) be the same for every item/asset?

@vincentsarago
Copy link
Member

Hi @hrodmn,
The inconsistent shape comes from the different resolution of the 2 items.

This should be handled in https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/models.py#L442-L454 or maybe there is a bug!

The issue when using feature/part over different assets, is that rio-tiler will determine the output with/height for each assets and then try to merge them into one array. Because of the different position of the assets, the resolution (in a give CRS) can be different and thus give different width/height.

🤦 oh but for Mosaic we don't check this 🤦 (this is why part/feature shouldn't be used in production 😓)

We should patch rio-tiler https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/mosaic/methods/defaults.py#L22 to first check if we can merge the data all together.

The best would be to use fixed width/height (defining the output resolution) for a feature, bu sadly that's on the User side

@hrodmn
Copy link
Collaborator Author

hrodmn commented Sep 11, 2023

Thanks for the info, @vincentsarago! I think I have found a workaround for this issue: if I project the coordinates to the same CRS as the items in the STAC collection (which is always epsg:5070 in my case) then set coords-crs=epsg:5070 in my request, everything appears to be working smoothly.

The issue when using feature/part over different assets, is that rio-tiler will determine the output with/height for each assets and then try to merge them into one array. Because of the different position of the assets, the resolution (in a give CRS) can be different and thus give different width/height.

Yeah, I see the resolutions are different in the Affine transforms for the two ImageData objects when they are WGS84. Could we use the merged extents of the STAC items in the list to determine the output array extent, then pass that into the subsequent readers?

🤦 oh but for Mosaic we don't check this 🤦 (this is why part/feature shouldn't be used in production 😓)

Are there any other reasons why I shouldn't use part or feature? It is potentially really useful for something I am working on and I would love to help make it production-ready.

The best would be to use fixed width/height (defining the output resolution) for a feature, bu sadly that's on the User side

To do that we would need to add width and height parameters to endpoints that use feature, right?

@vincentsarago
Copy link
Member

Are there any other reasons why I shouldn't use part or feature? It is potentially really useful for something I am working on and I would love to help make it production-ready.

The issues are:

  • different resolution for items
  • if not well controlled, users could try to open many many files

Could we use the merged extents of the STAC items in the list to determine the output array extent, then pass that into the subsequent readers?

Yeah that's could be a solution, but I don't think you'll get the resolution of each assets before really opening them (except if they all have the proj extension)

To do that we would need to add width and height parameters to endpoints that use feature, right?

oh yeah that's a good idea we should replace the max_size dependency with https://github.com/developmentseed/titiler/blob/main/src/titiler/core/titiler/core/dependencies.py#L309-L320

@vincentsarago
Copy link
Member

This is now fixed in latest rio-tiler version https://github.com/cogeotiff/rio-tiler/blob/main/CHANGES.md#610-2023-09-15

But we are still blocked by stac-utils/pgstac#201 before we can update titiler / titiler-pgstac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants