Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_polygon: callback on pixels inside a polygon #287

Closed
jdries opened this issue Oct 22, 2021 · 7 comments · Fixed by #298
Closed

apply_polygon: callback on pixels inside a polygon #287

jdries opened this issue Oct 22, 2021 · 7 comments · Fixed by #298
Milestone

Comments

@jdries
Copy link
Contributor

jdries commented Oct 22, 2021

We received the following use case to support:

  • the user has polygons of parcels (agriculture)
  • for each parcel, a pixel-level map is expected, (a raster), so a map of the parcel (easy in openEO)
  • Now it comes: the timeseries of rasters (4D cube) for a given polygon needs to be filtered on the correlation between different dates. So dates with a correlation that deviates from average correlation need to be filtered out.

So question is how we could support this?

Proposal 1 'apply_spatiotemporal'

We already have processes like 'aggregate_spatial' and 'apply_neighbourhood'. So an 'apply_spatiotemporal' process could receive the input polygons, and simply apply a function to the timeseries of pixels within that polygon, receiving a labeled array as input? The output is then again a labeled array with updated pixel values.

In our case, the callback would probably be a UDF.

@m-mohr m-mohr self-assigned this Oct 22, 2021
@m-mohr
Copy link
Member

m-mohr commented Oct 26, 2021

@jdries Sounds interesting (and complex ;-) ). I'm a big confused because the first paragraph sounds a bit like you'd need a special filter_* function, but then the proposal is apply_*, which is somewhat different. I'm also not sure I'm fully understanding the apply_spatiotemporal proposal as to what the inputs and output(s) exactly are.

What's the priority on this (e.g. compared to SRR3 Platform use cases)?

@m-mohr m-mohr removed their assignment Oct 26, 2021
@JeroenVerstraelen
Copy link

JeroenVerstraelen commented Nov 3, 2021

The suggested process:

apply_spatial_temporal(code=None, runtime=None, process=None, version='latest', geometries)

  • code (str) – UDF code or process identifier (optional)
  • runtime – UDF runtime to use (optional)
  • process – a callback function that creates a process graph, see Processes with child “callbacks”
  • version (str) – Version of the UDF runtime to use
  • geometries (Union[BaseGeometry, dict, str, Path, Parameter]) – shapely geometry, GeoJSON dictionary or path to GeoJSON file

The UDF function is applied per geometry (= parcel).
Example UDF code:

def apply_datacube(cube: XarrayDataCube, context: dict):
  # Cube = tile(s) that intersect the geometry
  # Cube is masked, pixel is NaN if not in geometry.
  inarr=cube.get_array()
  dates = ['2021-10-22', '2021-10-23', '2021-10-24', '2021-10-25']
  results = {}
  for date in dates:
    field = inarr.loc[date,:]
    result[date] = some_computation(field)
  correlation_matrix(results)

  # Somehow return the correlation_matrix as a datacube.
  return cube

@m-mohr Is the name of the process in line with the rest of the API, or should it be one of:

  • apply_spatiotemporal
  • apply_spatio_temporal
  • apply_spatialtemporal

@m-mohr
Copy link
Member

m-mohr commented Nov 4, 2021

I'm happy to provide a process description for this. Please keep in mind that run_udf and the new process would probably not be joined like in your example. While you can do that in Python the openEO processes would be more generic.

I'll try to get a draft something today. I do not understand everything regarding this process yet though, so the proposal may be somewhat off, but can get us started better. For example, what is the required input cube? What happens to bands? How is the result of the callbacks merged? Why is this called spatiotemporal although the geometries are just spatial? Should this work just for polygons or all types of geometries (lines, points)?

Regarding the name, I'm not 100% sure. First of all, apply is usually pixel-level, but here we work on data cubes in the callback so it seems something like "chunk" could git better. Currently, we use "spatial" and "temporal" in process names, but never the combination. So maybe spatiotemporal? So the resulting name could be chunk_spatiotemporal? On the other hand, the chunks are spatial only...

Interesting how this use-case now somewhat contradicts with PR #286.

@m-mohr
Copy link
Member

m-mohr commented Nov 4, 2021

A first idea that would be used like this (pseudo-code):

data = ...
chunks = {type: 'MultiPolygon', coordinates: [...]}
process = function(chunk) { return run_udf(chunk, code, runtime, version) }
result = chunk_polygon(data, chunks, process)

Process:

{
    "id": "chunk_polygon",
    "summary": "Apply a process to spatial chunks of a data cube",
    "description": "The given data cube is chunked by the given polygons and applies the given process to each individual chunk.",
    "categories": [
        "cubes"
    ],
    "parameters": [
        {
            "name": "data",
            "description": "A data cube.",
            "schema": {
                "type": "object",
                "subtype": "raster-cube"
            }
        },
        {
            "name": "chunks",
            "description": "A GeoJSON object containing at least one polygon. The provided feature types can be one of the following:\n\n* A `Polygon` or `MultiPolygon` geometry,\n* a `Feature` with a `Polygon` or `MultiPolygon` geometry,\n* a `FeatureCollection` containing at least one `Feature` with `Polygon` or `MultiPolygon` geometries, or\n* a `GeometryCollection` containing `Polygon` or `MultiPolygon` geometries. To maximize interoperability, `GeometryCollection` should be avoided in favour of one of the alternatives above.",
            "schema": {
                "type": "object",
                "subtype": "geojson"
            }
        },
        {
            "name": "process",
            "description": "A process that accepts and returns a single data cube and is applied on each individual chunk. The process may consist of multiple sub-processes.",
            "schema": {
                "type": "object",
                "subtype": "process-graph",
                "parameters": [
                    {
                        "name": "data",
                        "description": "A chunk of the original data cube.",
                        "schema": {
                            "type": "object",
                            "subtype": "raster-cube"
                        }
                    },
                    {
                        "name": "context",
                        "description": "Additional data passed by the user.",
                        "schema": {
                            "description": "Any data type."
                        },
                        "optional": true,
                        "default": null
                    }
                ],
                "returns": {
                    "description": "The updated data cube.",
                    "schema": {
                        "description": "A data cube.",
                        "schema": {
                            "type": "object",
                            "subtype": "raster-cube"
                        }
                    }
                }
            }
        },
        {
            "name": "context",
            "description": "Additional data to be passed to the process.",
            "schema": {
                "description": "Any data type."
            },
            "optional": true,
            "default": null
        }
    ],
    "returns": {
        "description": "A data cube with the newly computed values and the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged.",
        "schema": {
            "type": "object",
            "subtype": "raster-cube"
        }
    }
}

This is probably not working for your use-case yet, but there are also several open questions (see above). We need to clarify them first. Maybe it's a good idea to initiate a call to discuss it.

@jdries
Copy link
Contributor Author

jdries commented Nov 8, 2021

I think the chunk_polygon proposal could indeed work.
@JeroenVerstraelen maybe you can set up a call when you have made some progress with the implementation?
Regarding priorities: it's not related to openEO platform, but something we have to get out of the way before we're allowed to work on the machine learning processes (internally).

m-mohr added a commit that referenced this issue Nov 10, 2021
m-mohr added a commit that referenced this issue Nov 10, 2021
m-mohr added a commit that referenced this issue Nov 10, 2021
m-mohr added a commit that referenced this issue Nov 10, 2021
@m-mohr
Copy link
Member

m-mohr commented Nov 10, 2021

Added PR #298 so that we can discuss it better.

@m-mohr m-mohr linked a pull request Nov 16, 2021 that will close this issue
@m-mohr m-mohr added this to the 1.3.0 milestone Nov 29, 2021
@jdries
Copy link
Contributor Author

jdries commented Jan 12, 2022

the experimental chunk_polygon process is now available on the vito backend

@m-mohr m-mohr modified the milestones: 1.3.0, 2.0.0, 2.1.0 Feb 1, 2023
@m-mohr m-mohr changed the title callback on pixels inside a polygon apply_polygon: callback on pixels inside a polygon Feb 1, 2023
m-mohr added a commit that referenced this issue Mar 15, 2023
* Add chunk_polygon #287

* Add mask_value parameter to chunk_polygon

* Rename: chunk_polgon -> apply_polygon

* Updates to terminology and definition

* Add exception

* geometries -> geometry
@m-mohr m-mohr closed this as completed Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants