Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosaicking #1

Closed
gjoseph92 opened this issue Mar 11, 2021 · 3 comments · Fixed by #44
Closed

Mosaicking #1

gjoseph92 opened this issue Mar 11, 2021 · 3 comments · Fixed by #44

Comments

@gjoseph92
Copy link
Owner

From pangeo-data/cog-best-practices#4 (comment):

Looks like this is focused on "stacking" items/assets/bands (which I think is the most common workflow). Any plans to incorporate the mosaic workflow like @TomAugspurger has put together in stac_vrt (or maybe this is already there and I just missed it)?

I didn't add any mosaicking directly (at the GDAL level), since you can actually do it pretty easily with plain dask/numpy. Something like:

# TODO `fill_value`s besides NaN
def _mosaic(chunk, axis):
    ax_length = chunk.shape[axis]
    if ax_length <= 1:
        return chunk
    out = np.take(chunk, 0, axis=axis)
    for i in range(1, ax_length):
        layer = np.take(chunk, i, axis=axis)
        out = np.where(np.isnan(out), layer, out)
    return out

mosaicked = stack.reduce(_mosaic, dim="time")

As far as I know, there aren't really any advantages to doing the mosaic in GDAL versus in dask. The one advantage GDAL could theoretically have is that it could short-circuit, and stop loading additional datasets as soon as the output image is already fully-filled-in—however, I don't know if GDAL actually implements this logic. And even if it does, the performance gains of early termination would quickly lose out to the cost of loading each dataset serially. Basically, I think you're better off letting dask read everything in parallel, then throwing away some data, compared to worst-case having GDAL read hundreds of datasets in serial.

So short answer: yes, this is focused only on "stacking", because I think of "mosaic" as just one among many reduction operations you might want to do to a stack (mean, median, quality-band mosaic, etc.).

The bigger question is whether offering a mosaic function is in scope for this project. Personally, I'd like to be, but it should probably be on an xarray accessor, which starts to bump up against the territory of rioxarray.

@geospatial-jeff
Copy link

geospatial-jeff commented Mar 11, 2021

Agreed that it would be better to implement mosaicing as a reduction instead of using GDAL. VRTs in particular have some weird scaling behavior related to the order in which files are read and how they interact with the various GDAL caches - some are per file-handle which can cause memory leaks on VRTs with many files. I think you have the right approach overall; if you can do it natively with numpy/dask then let's use that, and fall back to GDAL as needed.

The bigger question is whether offering a mosaic function is in scope for this project.

This is an interesting question. My assumption after giving the repo a look over was that it would support mosaics (although this isn't stated anywhere). I think mostly because Item Collections (basically a list of items) aren't necessarily always stacked perfectly so support of mosaicing is implicit in my opinion.

@snowman2
Copy link

Another option would be to combine GDAL + Dask. For example:
https://gist.github.com/rmg55/875a2b79ee695007a78ae615f1c916b2

@gjoseph92
Copy link
Owner Author

Certainly an option—just as we discussed above, there aren't many benefits to using GDAL for it, and likely a few downsides.

gjoseph92 added a commit that referenced this issue May 5, 2021
`stackstac.show` and `stackstac.add_to_map` display Dask-backed DataArrays on ipyleaflet maps.

Other changes:
* Exposed some handy spatial and miscellaneous operations in the public API (`reproject_array`, `xyztile_of_array`, etc.)
* Exposed `stackstac.mosaic`! Closes #1.
* Reorganized docs to have an examples subsection and base API reference page
* Added visualization notebook from my webinar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants