Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector data cubes (overview) #58

Open
7 of 16 tasks
m-mohr opened this issue Feb 23, 2022 · 9 comments · Fixed by #59
Open
7 of 16 tasks

Vector data cubes (overview) #58

m-mohr opened this issue Feb 23, 2022 · 9 comments · Fixed by #59
Assignees

Comments

@m-mohr
Copy link
Member

m-mohr commented Feb 23, 2022

What we need to do to add vector data cubes in openEO:

Questions:

  1. Do we want to restrict geometries to one geometry type per vector dimension?
    • Tendency: No, allow mixtures
  2. Do we restrict to only Point, LineString, Polygon, and the Multi-variants (and thus exclude e.g. PolyhedralSurface)? We already discourage GEOMETRYCOLLECTION in several processes.
    • Tendency: Yes, restrict to the types mentioned above
  3. How do we handle null/empty geometries?
    • Tendency: Don't allow them / skip them during import
  4. Representation of "dimension labels" (in STAC: "values")?
    • In metadata: ID, WKT, or GeoJSON (see STAC data cube extension PR)
    • In processes: It is just a representation so we can do multiple things, e.g. allow users to choose between WKT and ID. Or we need to decide on one of them. We can't really use "1D vector cubes" as labels (unless we change it).
  5. How to handle units in processes? See Unit in vector processes openeo-processes#330
  6. Define (and describe) generally how to convert vector data into a vector data cube: process to convert inline GeoJSON to a vector cube openeo-processes#346 (comment) There's a proposal from Brockmann, GeoJSON could be aligned with STAC (datetime in properties)
@m-mohr m-mohr self-assigned this Feb 23, 2022
@m-mohr m-mohr linked a pull request Mar 1, 2022 that will close this issue
@m-mohr m-mohr mentioned this issue Mar 7, 2022
@mkadunc
Copy link
Member

mkadunc commented Mar 10, 2022

  1. Do we want to restrict geometries to one geometry type each per vector dimension?

I'm not a fan of restricting in the standard; maybe, if restricting would be required for easier implementation, we could add this info to the backend capabilities.

It would be useful to have metadata about types in a dimension for specific data cubes, though - i.e. if I load a vector cube, it would be good to know which geometry types to expect for the labels on the spatial dimension.

  1. Point, LineString, Polygon, and the Multi-variants...

I agree that we leave out PolyhedralSurface etc. (for now). GeometryCollection is borderline - some vector operations might return GC in which case we'll have to "normalize" the results to the higher-dimensional type (e.g. an intersection of two linestrings will most likely be a point, but could also be a linestring; if we support only one type, we'll have to represent all points as degenerate single-point linestrings).

Having looked at OGC EDR, it seems that support for XYZ and XYM / XYZM geometries would be useful.

  1. Representation of "dimension labels" (in STAC: "values")?

I'd say GeoJSON (the 'non-standard' one with CRS).

I don't think ID is necessary - if you strip geometry values from a vector cube, it becomes a non-vector data-cube IMO.

  • Do we need the actual geometries in callbacks?

I'd say yes - let's treat geometry labels same as any other labels (e.g. named bands).

  1. How do we handle processes that now require "raster-cubes"

Rename raster-cube to data-cube in the schema and replace everywhere. Then introduce raster-cube as a subclass, and use it instead of data-cube in processes that do special things with raster spatial dimensions (x,y).

  1. What name do we recommend for the vector dimension?

geometry seems better than vector. feature would also be an option, or reference-geometry

@m-mohr
Copy link
Member Author

m-mohr commented Mar 10, 2022

Thanks, @mkadunc. Interesting that several of your points are exactly contrary to what @edzer proposed to me before. I guess you can have some good discussions here while I'm on vacation. ;-)

It would be useful to have metadata about types in a dimension for specific data cubes, though

That's a pretty good idea indeed. I should add that to stac-extensions/datacube#10

GeometryCollection is borderline

Right now we say in processes that

To maximize interoperability, a nested GeometryCollection should be avoided. Furthermore, a GeometryCollection composed of a single type of geometries should be avoided in favour of the corresponding multi-part type (e.g. MultiPolygon).

Not sure what backends actually do with this in implementation though.

I'd say GeoJSON (the 'non-standard' one with CRS).

Then it's not GeoJSON though. So you mean the real invalid one (I'd like to avoid that) or were you referring to this new JSON-FG from OGC? https://github.com/opengeospatial/ogc-feat-geo-json (I could see us using that, but it's WIP).

Rename raster-cube to data-cube in the schema and replace everywhere. Then introduce raster-cube as a subclass, and use it instead of data-cube in processes that do special things with raster spatial dimensions (x,y).

That's breaking and requires processes v2.0. I assume implementors will not be happy about it. (Also, in the schemas we don't really have subclasses except from subclassing native types).

@edzer
Copy link
Member

edzer commented Mar 10, 2022

geometry seems better than vector. feature would also be an option, or reference-geometry

I also like geometry, or alternatively feature_geometry. In SFA a feature is a thing that has a geometry and other attributes.

I think I'm also in favour of a GeoJSON that does not restrict to EPSG:4326. Although that is a (IETF) standard, it's clearly out of date and not good enough for today's requirements. But the individual feature geometries must then each come with a CRS, right? Or will the CRS be a property of the metadata for the dimension as a whole?

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2022

Discussed with @edzer:

  1. Allow different types per dimension.
  2. Yes, restrict to the types mentioned above
  3. Representation of "dimension labels": In metadata: see STAC - In processes: Vector cube, 1 vector dimension, 1 label
  4. ?
  5. See Unit in vector processes openeo-processes#330
  6. geometry

@m-mohr
Copy link
Member Author

m-mohr commented Mar 17, 2022

Question 7: What do we do with additional "metadata", e.g. ids and properties assigned to a feature? Related: Open-EO/openeo-processes#347 (comment)

Not sure about the IDs, but I guess for vector data you specify which properties to load into the data cube (as additional dimension if 2+ properties) and the rest is kept somewhere in the background. So we may want to add id and properties as additional optional fields to the vector dimension. There's no way to access these information through processes right now, but we should probably state that id and properties are kept untouched in general by processes unless otherwise stated by processes.

This is issue about the additional metadata that is present at the start and may get passed through and should be included in the result is also very much unspecified for raster, by the way.

@mkadunc
Copy link
Member

mkadunc commented Mar 17, 2022

for vector data you specify which properties to load into the data cube (as additional dimension if 2+ properties) and the rest is kept somewhere in the background. So we may want to add id and properties as additional optional fields to the vector dimension.

I'm not sure I understand this 'additional dimension' part — say we have a vector cube which stores a real-valued variable mean_reflectance with 3 dimensions (geometry, time, band), and we want to load 2 extra properties for vector data (e.g. id, land_class):

  • if extra properties are loaded as additional dimension, then:
    • the variable changes and becomes just value (openEO concept of a data cube does not allow for more than one variable)
    • the type of the variable changes from real/float to any (or string... something that can capture the original variable and all types of the extra properties)
    • the extra dimension basically takes the role of variable, and has labels {'mean_reflectance, id, land_class`}
    • the cube is quite unbalanced - the sub-cubes for variable indices of id and land_class are basically 1-D (value is constant along time and band dimensions), and the sub-cube for variable = mean_reflectance is 3-D
  • if extra properties (id and land_class) are additional fields on the vector dimension:
    • the variable stays mean_reflectance and keeps its type, regardles of any extra properties loaded
    • extra properties are stored on the geometry dimension, e.g. inside its labels (if labels are GeoJSON, we could use 'feature' object type to store this; or we allow labels to be tuples, generic JSON dictionaries or arrays)

@soxofaan
Copy link
Member

I agree with @mkadunc and had the same conceptual struggle in Open-EO/openeo-processes#356

@m-mohr
Copy link
Member Author

m-mohr commented Apr 5, 2022

I think we need to discuss this again in detail with all experts. As we are close to the end of SRR3, we will likely not be able to tackle it beforehand so I'd propose to have a dedicated meeting afterward (or discuss it in Bolzano).

@m-mohr
Copy link
Member Author

m-mohr commented Jun 9, 2022

Some notes from the April PSC meeting:

  • Uni Salzburg - zgis is also working on data cubes
  • CovJson - interesting file format, now also maturing in the OGC
  • Set up a meeting with EDC in the next weeks, all to write documents about the understanding of data cubes until LPS, then have some dedicated time in June to work on stuff (didn't happen => summer)
  • Maybe don’t add vector-cube and raster-cube subtypes and just specify data-cube as a type and then specify in processes required dimension types. Inheritance is a problem in process definitions, but lack of inheritance might be an issue in implementations like the Python client.
  • We may want to consider going towards openEO processes 2.0

@m-mohr m-mohr reopened this Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants