Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiPolygon support for mask_polygon #237

Closed
soxofaan opened this issue Apr 21, 2021 · 13 comments · Fixed by #242
Closed

MultiPolygon support for mask_polygon #237

soxofaan opened this issue Apr 21, 2021 · 13 comments · Fixed by #242

Comments

@soxofaan
Copy link
Member

"description": "A GeoJSON object containing a polygon. The provided feature types can be one of the following:\n\n* A `Polygon` geometry,\n* a `GeometryCollection` containing Polygons,\n* a `Feature` with a `Polygon` geometry or\n* a `FeatureCollection` containing `Feature`s with a `Polygon` geometry.",

A GeoJSON object containing a polygon. The provided feature types can be one of the following:
A Polygon geometry,
a GeometryCollection containing Polygons,
a Feature with a Polygon geometry or
a FeatureCollection containing Features with a Polygon geometry.

Why is MultiPolygon not allowed, but GeometryCollection is, while they are equivalent in the context of masking?

Related: aggregate_spatial describes behavior for points and lines as well, so it might make sense to also copy that to mask_polygon. I vaguely remember we discussed this before, but I can't reference right now.

As far as I could recover from git, the whitelist was there from the start (added in #20), but I can't find a reason for the excluded types.

@m-mohr
Copy link
Member

m-mohr commented Apr 26, 2021

MultiPolygon should indeed be allowed, I think.
I'm actually also thinking, why not just make the process mask_spatial and allow all geometries as defined also in aggregate_spatial? The only reason is probably "breaking change"?

@m-mohr
Copy link
Member

m-mohr commented Apr 26, 2021

Probably something to do together with the vector data cube update for #68 as this would change the process anyway.

@soxofaan
Copy link
Member Author

why not just make the process mask_spatial and allow all geometries as defined also in aggregate_spatial?

I'm not sure, for masking you also have mask for masking with a raster mask, which is also spatial, so mask_spatial would be confusing. It could make sense to go for mask_geometry though.

@jdries
Copy link
Contributor

jdries commented Apr 27, 2021

Don't really see why this moved from a very simple, backwards compatible, update to the spec, to a breaking change?
Breaking changes really require a lot more motivation than what we have here. (The last rename of masking processes caused quite a bit of breakage, time loss and subsequent complaints from my users, hence the concern.)

@soxofaan
Copy link
Member Author

yes indeed, I just wanted to raise the question about why certain geometry types are excluded for mask_polygon (in VITO backend we do support MultiPolygon for example).

The discussion about rename is something else and a lot lower priority for me

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

Strictly speaking, adding MultiPolygon is also a breaking change. It "breaks" interoperability with back-ends that do not update to follow to implement the new version with MultiPolygon. Nevertheless, it feels like we have somehow converged towards calling only user-facing changes breaking or so?

Anyway, I have tagged this as both "minor" and "breaking", which means we could consider the addition of Multipolygon in v1.1 and later - once we decide to do more breaking changes anyway - do the rename. The reason for the rename would be that mask_polygon is unintuitive if it allows all kinds of geometries and it would deprecate mask_polygon and add the new process.

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

At which point did we rename mask? I can't remember what kind of change that was...

@soxofaan
Copy link
Member Author

#103 #110: here we did the split between mask and mask_polygon (which was a single process previously)

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

Ah, I see... so there it was a change in an existing process, while here it would be a new process and the old one could still be available as a deprecated process for a certain amount of time. So I don't see a lot of things to break.

Another question is whether the integration of #68 would actually allow us to remove the special case of GeoJSON from processes and simply allow vector cubes (i.e. deprecate mask_polygon). They could be created with "get_vector" (whatever name it will have; see the python driver PR) and used in mask instead of directly passing GeoJSON to mask_polygon. But that is something that will arise from discussions around #68.

Anyway, we need to clarify whether such a change (i.e. adding MultiPolygon support) is considered to be a breaking change.

@soxofaan
Copy link
Member Author

Adding MultiPolygon support to the mask_polygon spec doesn't seem like a breaking change.
Very strictly speaking you could argue that it is indeed ("breaks interoperability"), but then every new feature or process is a breaking change, no?

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

I've opened PR #242, but before merging, I'd like to clarify what the direction is wrt such changes.

Adding new processes doesn't feel like breaking changes as there's no requirement to implement all processes. Also, this case is a bit different in that the change is just in a description and is not exposed in a machine-readable way through schemas (although we could add them...)

@jdries
Copy link
Contributor

jdries commented Apr 27, 2021

Interesting indeed to have the discussion about breaking changes. Let me try to illustrate what the impact is of both scenario's:
Scenario 1: Process definition is extended to support more inputs
Scenario 2: A new process, more broadly defined, process is created, old process is deprecated.

Scenario 1 impact:

  • Hey user, your script now supports multipolygons on backends that implement the change, on the other ones, your script will work, but not for multipolygon. You don't have to do anything. User goes: thank you.
  • If a backend does not (yet) support the new feature, and user wants those multipolygons-> user logs a request for improvement, all users coming after that benefit from the change.

Scenario 2 impact:

  • Hey user, we have a new feature, that is supported on some backends, but you'll have to adjust your process graphs to work with backends that have the new process. On the other backends, your script can work, with the old version of the code, until the backend supports the new process. User goes: sounds complex, do I really need those multipolygons?
  • Same as scenario 1: user has to request support for the entirely new process, both in backends and in clients.
  • But then there's documentation. All of the documentation and scripts floating around the internet, referencing the deprecated process will no longer be valid. All documentation and all sample code will have to be updated -> this part usually doesn't work out.
  • Which brings me to the last point: the user support channel will spend the next two years getting questions like: hey, I'm using this mask_polygon code snippet, and it doesn't work with multipolygon. Support guy has to stay patient, and tell the user that he in fact found an older recommendation, and should update his code, which may actually evoke a not so kind reaction from the user. Then the support guy has to track down the deprecated code sample, and manage to get it removed or updated...

Let me know if I missed something, but based on this, I really prefer scenario 1 as it keeps the general level of annoyance in a community a bit lower.

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

The scenarios do not fully reflect the discussion above: Just for the multi polygons there was no name change proposed, see the corresponding PR. Scenario 2 was meant to also add support for all other geometries (line, point, ...), which would add to scenario 1 support channel cases where a user doesn't find the right process as it's called mask_polygon, while it may support points and lines.

Scenario 1 also doesn't fully go into the "change the back-end" use case. In scenario 2 it is easier to detect why a process is not running (process not available), while in scenario 1 you need to either go into details with the process specification to figure out changes from the description or the schema or run it to find out why it's not working on another back-end. That would also lead to support channel cases.

But this will be a case-by-case decision of the PSC anyway

@m-mohr m-mohr linked a pull request May 18, 2021 that will close this issue
m-mohr added a commit that referenced this issue May 26, 2021
mask_polygon: Support multi polygons #237
@m-mohr m-mohr closed this as completed May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants