Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoMedianMethod and MaxIndexMethod for pixel selection #270

Closed

Conversation

wouellette
Copy link

I added GeoMedianMethod and MaxIndexMethod as pixel selection methods.
The former is based on Dale Roberts' hdmedians package, a geometric median implementation in cython.

For GeoMedianMethod, I didn't want to add cython dependencies to rio-tiler, so currently it's just a naive numpy array assignment which is rather slow, but left it like this to spark the discussion on how to best go forward on this one, as I think pixel selection using the geomedian is rather useful. I contacted the Geoscience Australia project, who seem to internally use an hdstats package that offers an interface to n-dimensional array geomedian compositing: GeoscienceAustralia/digitalearthau#260, but it's currently not on PyPI.

For the MaxIndexMethod, the idea is to expand the behavior of a max NDVI composite (which is itself missing) to any possible band combination.

@wouellette wouellette changed the title GeoMedianMetho and MaxIndexMethod for pixel selection GeoMedianMethod and MaxIndexMethod for pixel selection Sep 29, 2020
"""Add data to tile."""
self.tile.append(tile)

class MaxIndexMethod(MosaicMethodBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this method is a bit complex. I have used custom methods called CustomLastBand...

The idea is to use expression like B1,B2,B3,{NDVI} which will create a 4 bands array and then within the mosaic method, we will exclude the last band from the array.

class CustomLastBandHigh(MosaicMethodBase):
    """Feed the mosaic tile using the last band as decision factor."""

    @property
    def data(self):
        """Return data and mask."""
        if self.tile is not None:
            return self.tile.data[:-1], ~self.tile.mask[0] * 255
        else:
            return None, None

    def feed(self, tile: numpy.ma.array):
        """Add data to tile."""
        if self.tile is None:
            self.tile = tile
            return

        pidex = (
            numpy.bitwise_and(tile.data[-1] > self.tile.data[-1], ~tile.mask)
            | self.tile.mask
        )

        mask = numpy.where(pidex, tile.mask, self.tile.mask)
        self.tile = numpy.ma.where(pidex, tile, self.tile)
        self.tile.mask = mask


class CustomLastBandLow(MosaicMethodBase):
    """Feed the mosaic tile using the last band as decision factor."""

    @property
    def data(self):
        """Return data and mask."""
        if self.tile is not None:
            return self.tile.data[:-1], ~self.tile.mask[0] * 255
        else:
            return None, None

    def feed(self, tile: numpy.ma.array):
        """Add data to tile."""
        if self.tile is None:
            self.tile = tile
            return

        pidex = (
            numpy.bitwise_and(tile.data[-1] < self.tile.data[-1], ~tile.mask)
            | self.tile.mask
        )

        mask = numpy.where(pidex, tile.mask, self.tile.mask)
        self.tile = numpy.ma.where(pidex, tile, self.tile)
        self.tile.mask = mask

Copy link
Author

@wouellette wouellette Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a lot of sense.

The reason I didn't go for a shipped selection band as part of the asset is in the case of feeding STAC assets directly to mosaic_reader from an S3 bucket, this selection criterium band is not available, and that would require computing them upfront.

I guess this could be done in the tiler method passed to mosaic_reader, as in example below (pseudo-code to illustrate the idea, I haven't tested it):

from satsearch import Search
from rio_tiler_pds.sentinel.aws import S2COGReader
from rio_tiler.mosaic import mosaic_reader
from rio_tiler.mosaic.methods import defaults
from rasterio.crs import CRS

def normalized_difference(a, b):
    nd = (a - b) / (a + b)
    return nd

def tiler(src_path: str, *args, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
    with S2COGReader(src_path) as cog:
        data, mask = cog.part(*args, **kwargs)
        data[-1] = normalized_difference(data[-2], data[-1])
        return data, mask

search = Search(bbox=bbox,
               url="https://earth-search.aws.element84.com/v0",
               datetime=f'{time_interval[0]}/{time_interval[1]}',
               query={'eo:cloud_cover': {'lt': 90}},
               collections=['sentinel-s2-l2a-cogs'])

sentinel, assets_used = mosaic_reader(assets, tiler, bbox,
                                      bounds_crs=CRS.from_epsg(bbox.crs),
                                      pixel_selection=defaults.CustomLastBandHigh(),
                                      bands=('B02', 'B03', 'B04', 'B08'))

The problem is that we lose the string reference to the bands in the tiler method, which is not really an issue as it is custom set by the user, so it can always be aligned with the bands set in mosaic_reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wouellette I think you could use expression for what you are trying to do.

from satsearch import Search
from rio_tiler_pds.sentinel.aws import S2COGReader
from rio_tiler.mosaic import mosaic_reader
from rio_tiler.mosaic.methods import defaults
from rasterio.crs import CRS

def tiler(src_path: str, *args, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
    with S2COGReader(src_path) as cog:
        return cog.part(*args, **kwargs)

search = Search(
    bbox=bbox,
    url="https://earth-search.aws.element84.com/v0",
    datetime=f'{time_interval[0]}/{time_interval[1]}',
    query={'eo:cloud_cover': {'lt': 90}},
    collections=['sentinel-s2-l2a-cogs']
)

(data, mask), assets_used = mosaic_reader(
    assets, tiler, bbox,
    bounds_crs=CRS.from_epsg(bbox.crs),
    pixel_selection=defaults.CustomLastBandHigh(),
    expression="B02,B03,B04,(B04-B08/B04+B08)"
)

# Note: data would be a (3xnxn) array of type float because of the NDVI band

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely overlooked the fact that an expression field was available...

That's sorted then! What's the plan with this? Crystallize these methods in the library? Or was your idea to keep them separate.

# TODO: How to implement geomedian without introducing too many new dependencies?
for xid in range(x): # for each x-axis
for yid in range(y): # for each y-axis
tile[:, yid, xid] = hd.geomedian(stacked_tile[:, :, yid, xid])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's won't be fast 😬

I wonder if we could vectorize the calculation with numpy ufunc (cc @kylebarron)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this won't be fast.

I think the main problem here is that hd.geomedian only supports a two-dimensional array as input. So it can't natively broadcast over a higher dimensional array.

Also how does this work with missing data? Does it automatically mask out portions of the array that are undefined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the missing data, It returns 0s currently, which you can then convert to NaNs a posteriori, or directly use the hd.nangeomedian, which handles missing data directly (should have used that one for rigor's sake).

As I said, I believe the n-dimensional (at least 4-dimensional) implementation of their geomedian sits somewhere at Geoscience Australia already developed, as I can see a geomedian_pcm (and even a nangeomedian_pcm) called here,

That's why I'm reluctant to make a PR on hdmedians as I'm pretty certain the work's already been done.

Paper for reference: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8004469

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to import hdstats, which isn't on PyPI... I can't find anything about that package

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confirmation from Dale Roberts that hdstats will be made available through PyPI in the coming months so we can stall this for now.

@vincentsarago
Copy link
Member

@wouellette I'm going to close this PR for now. Feel free to re-open when things move on hdstats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants