-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute new labels (e.g. bands) easily #233
Comments
A (minor) possible issue with this use case is the fact that you want to abuse the 'bands' dimension to introduce a different kind of variable into the data cube, producing a dimension that does not represent spectral bands any more, but rather some arbitrary variable values. And the result is a cube that contains inhomogeneous data types - original band values represent reflectance (or radiance or other measured physical quantity) which is non-negative and can be larger than 1, while the new computed "pseudo" bands contain values of remote sensing indices (typically consistent with a ratio of two physical quantities, e.g. reflectances) that span the range [-1, 1]. When discussing openEO data model, we said that the 'standard' way to store different types of variables was to use separate data cube. Philosophical considerations aside, if I'd want to do something like that, 'Alternative 1' would feel the most natural, but I would expect
|
Yes, indeed. People just see bands as layers in a file they can use for whatever they like. It's not restricted to spectral bands. But that's an issue throughout the whole EO community, it seems. At least we also had the discussion in STAC with me being the only person voting for separating spectral bands and "layers" in a file. Also, in mostly all use cases that we've done in openEO or will do in Platform, it seems that people just use the bands as described above. So the question is whether it's worth the effort to teach them to do it the other ("strict") way. Also, in other areas we are also less strict with us putting, e.g. quality layers into With those labeled arrays we have the issue that there's no native way to transmit them through JSON, so we've tried to design them in a way that they actually only come up in callbacks, which is handled internally in back-ends. In an object you can't transmit the order, in an array you can't transmit the labels. So mostly all of our processes are designed to accept labeled arrays from callbacks, but just return normal arrays. That's basically the issue why this is so difficult to do and issues like this come up. I'm really not happy about all this, but introducing labeled arrays throughout the whole set of processes leads to its own challenges again and may make several processes and implementations more difficult. |
I see. Maybe we could nudge our users a bit by calling such dimensions something other than 'bands' in our own examples, e.g. 'variable'? Not to push anything, but another option that could be used to fix the problem with
I also have an option 'C' for solving the specific use-case without apply_dimension: we add a process that appends a label and all its data along a specified dimension (similar to Javascript
To me option 'C' is closer to my mental picture of this use-case than either 'A' or 'B'. Also, this naturally extends to the possibility of |
So I finally arrived at the point where I actually need to do this. So I'm somewhat inclined to use 'alternative 1' now, for adding computed bands. I also believe that what I need in the use case is not only adding labels to the band dimension, but I also reduce the time dimension, producing multiple outputs per band as opposed to one output per band.
The quantiles function is interesting in the sence that it returns an array by design, so it seems like we could never really compute multiple quantiles inside a reduce_dimension? Sidenote: this is relatively easy to implement in the backend, mostly because the bands dimension in my datacube is a list of variables that I can grow and shrink easily. But seems like I also don't have an immediate alternative here... |
You're right, it does - it's basically the same as alternative 2. with single-step process for
According to process documentation, you should use |
We have another dimension type for that, called
Yes, this could also be an option, although I've never seen anyone actually use the target_dimension but instead usually just writing back to the bands etc.
Yes, that's pretty much what I've also thought about with add_labels in the first post, but append_label allows to actually set a data cube for the data in addition. So I like that approach, too, as it's very generic and could be useful in general. So I think I'll work on something in this direction, but of course @jdries can start with the workarounds we have right now.
Yes, quantiles (also: extrema) is not a reducer, but needs apply_dimension to be used. That's why they are not in the reducer category. We decided at some point during discussions that a reducer is strictly only returning a single value (which in theory could be an array, but that's probably not a good idea.).
I've asked for feedback for weeks, but except for @mkadunc no response, so no "immediate" solution for sure. I can provide proposals in a day or two (outside of vacations), but that needs some consensus and input from others. |
Another way to put it is as an apply_neighborhood that works on the full temporal and band dimension, and only maintains one label for the temporal dimension and multiple for bands. There seems to be some inconsistency in the definition of apply_neighborhood related to that.
In the return type description:
|
It feels to me that this is too much of a stretch for a process that is not meant to be used in this way, as you already mentioned. It's not meant to reduce dimensions. |
Dev telco conclusion: I will try to use apply_dimension with target_dimension set to the existing band dimension. |
I wasn't really aware that we use the apply_dimension for flattening purposes, we'll need to check whether that makes sense long-term and is covered by the documentation. Related issues: |
Just posting the old add(_computed)_labels proposals here for completeness. These are likely to be superseded by another proposal. {
"id": "add_labels",
"summary": "Adds new labels to a dimension.\n\nThis is especially useful to compute new bands with ``apply_dimension()``.",
"description": "Adds one or more new labels to the given `dimension`.",
"categories": [
"cubes"
],
"parameters": [
{
"name": "data",
"description": "A data cube to add the dimension to.",
"schema": {
"type": "object",
"subtype": "raster-cube"
}
},
{
"name": "labels",
"description": "The name of the dimension over which to reduce. Fails with a `LabelExists` exception if one of the specified label exists already.",
"schema": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
}
},
{
"name": "value",
"description": "The value to set for the given labels. Defaults to `null` (no-data).",
"schema": {
"description": "Any data type."
},
"default": null,
"optional": true
}
],
"returns": {
"description": "The data cube with a newly added dimension. The new dimension has exactly one dimension label. All other dimensions remain unchanged.",
"schema": {
"type": "object",
"subtype": "raster-cube"
}
},
"exceptions": {
"DimensionExists": {
"message": "A dimension with the specified name already exists."
},
"LabelExists": {
"message": "A label with the specified name already exists."
}
}
} {
"id": "add_computed_labels",
"summary": "Adds new labels with new values",
"description": "Adds one or more new labels with newly computed values to the given `dimension`.",
"categories": [
"cubes"
],
"parameters": [
{
"name": "data",
"description": "A data cube to add the dimension to.",
"schema": {
"type": "object",
"subtype": "raster-cube"
}
},
{
"name": "process",
"description": "Process to be applied on pixel values. The specified process needs to accept an array and must return an array with exactly the number of elements that are given to the parameters `labels`. A process may consist of multiple sub-processes.",
"schema": {
"type": "object",
"subtype": "process-graph",
"parameters": [
{
"name": "data",
"description": "A labeled array with elements of any type.",
"schema": {
"type": "array",
"subtype": "labeled-array",
"items": {
"description": "Any data type."
}
}
},
{
"name": "context",
"description": "Additional data passed by the user.",
"schema": {
"description": "Any data type."
},
"optional": true,
"default": null
}
],
"returns": {
"description": "The value to be set in the new data cube.",
"schema": {
"description": "Any data type."
}
}
}
},
{
"name": "labels",
"description": "The name of the dimension over which to reduce. Fails with a `LabelExists` exception if one of the specified label exists already.",
"schema": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
}
},
{
"name": "dimension",
"description": "The name of the dimension to apply the process on and to add the labels to. Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.",
"schema": {
"type": "string"
}
},
{
"name": "context",
"description": "Additional data to be passed to the process.",
"schema": {
"description": "Any data type."
},
"optional": true,
"default": null
}
],
"returns": {
"description": "The data cube with a newly added dimension. The new dimension has exactly one dimension label. All other dimensions remain unchanged.",
"schema": {
"type": "object",
"subtype": "raster-cube"
}
},
"exceptions": {
"DimensionExists": {
"message": "A dimension with the specified name already exists."
},
"LabelExists": {
"message": "A label with the specified name already exists."
}
}
} |
…discarded them. Should help with #233.
…discarded them. Should help with #233.
) * Better support for labeled arrays in array processes that previously discarded them. Should help with #233. * Update proposals/array_append.json * Throw an error when labels exist in both arrays.
A common use case is to compute one or multiple new label (e.g. a band) while maintaining the source data cube. So for example compute the ndvi and evi for a data cube with 5 bands (1,2,3,4,5), which results in a data cube with 7 bands (1,2,3,4,5,ndvi,evi).
There are currently at least two ways to achieve this in openEO (pseudo-code):
Both approaches are overly complex and annoying to implement. Therefore, two new ideas for discussion:
What are your thoughts on this? Which proposal is better? Should we add one of them? Both?
I feel like proposal B is the easiest to work with, but add_labels on its own could be useful, too.
The text was updated successfully, but these errors were encountered: