-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array of geometry types #140
Conversation
56ba2af
to
07398e9
Compare
why not GeometryCollection ? "GeometryCollection Z" is a valid WKB type |
07398e9
to
9f94199
Compare
My bad. I removed the language about this. |
Should the |
If I'm reading it correctly I think it is being installed from this repo geoparquet/.github/workflows/scripts.yml Line 23 in 134ebef
The |
Oh @tschaub you have to change this line geoparquet/format-specs/schema.json Line 92 in 134ebef
|
38a30e4
to
98fd71e
Compare
98fd71e
to
3eefca3
Compare
Thanks for the clarification. I missed the |
@@ -53,7 +53,7 @@ Each geometry column in the dataset must be included in the columns field above | |||
| Field Name | Type | Description | | |||
| --- | --- | --- | | |||
| encoding | string | **REQUIRED** Name of the geometry encoding format. Currently only 'WKB' is supported. | | |||
| geometry_type | string or \[string] | **REQUIRED** The geometry type(s) of all geometries, or 'Unknown' if they are not known. | | |||
| geometry_types | \[string] | **REQUIRED** The geometry types of all geometries, or an empty array if they are not known. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field can be empty, why require it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the description should probably mention that it's a set of values (i.e. unique items in the array - see also the other comment regarding the schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is optional, I assume we wouldn't assign any different meaning to {}
and {"geometry_types": []}
. So then do we want "minItems": 1
?
I like the idea of avoiding two ways to say the same thing. No strong preference between optional with min length of one vs required. Though it seems slightly more explicit to say that []
means "unknown" (whereas the absence of the property might either mean "unknown" or "I forgot" - not sure if there is any significant difference between the two though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the point I think you are making though. If it isn't required that the geometry types be known, why require the geometry types in the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if it is optional, minItems should be set to 1, indeed. I have a slight tendency for not requiring it as for me an empty array reads more as "no type". On the other hand, having it required makes people think about it (if implementations don't set a default such as write_geoparquet(geometry_types = [], ...)
- but I guess a validator should anyway throw a warning if the field is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some discussion on this topic happened in the original issue #41. The idea was indeed that we wanted to avoid two ways to say the same thing (not specified if optional, vs geometry_type="Unknown"
.
And I think the rationale for making it required was indeed to have people think about it / recommend to specify a useful value (instead of just leaving out): #41 (comment)
I also find the empty array a bit less clear in indicating "unknown". Another option could also be to keep the field required and allow either the single string "Unknown" or either a list of minimum length 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(geometry_types) == 0
vs
geometry_types[0] == "Unknown"
(I misspelled Unknown twice while trying to type that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not a big fan of "Unknown" either. Also, thinking a bit more about STAC it seems for everything we made required to make people think about it, it did not really work that way. People still just push things out without validation, so I could also see that people just omit geometry_types although required. So I tend towards not required and minLength: 1 for the array. As an explicit way to formulate "unknown" we could also consider null.
3eefca3
to
7c2b3b8
Compare
7c2b3b8
to
79cce58
Compare
Do we have enough of a consensus to merge this as is? Current summary:
|
Random thought: is there ever a case when something like |
Oh I hope not :) My thinking on this is that It's hard for me to imagine where |
I may be the one that initially pushed for unknown. Let me give the context. The idea is that you may have schemas for layers where you want to be able to put any type of geometries. In some files, geometries might be homogenous (let say only points), in other files mixed. Or you may have homogenous individual files, but one is points only and another ones is polygons only. But in the end, you want that the layer schema says "there might be any type of geometry" so that when you import it to something else (e.g importing a collection of GeoParquet files to a single GeoPackage or PostGIS layer) or compute the union of several files, you get this "any type of geometry" property. For a writer perspective,
So in that context |
@jorisvandenbossche - are you also ok with the latest? |
This renames the
geometry_type
property togeometry_types
and updates the JSON type to be an array of strings (instead of a string or array of strings). The spec language has been updated to reflect that an empty array indicates that the geometry types are not known. The schema now enforces that the items in the list are one of the expected types and allows theZ
suffixfor everything except.GeometryCollection
I updated the
example.parquet
file to usegeometry_types
instead ofgeometry_type
. (I followed the readme, but am struggling to run the validator, so admit I haven't yet done that.)I bumped the schema version to
0.5.0-dev
. Ideally this would happen as part of the process for releasing a tag (create release branch, update version, create tag, generate release, bump version back toX.Y.Z-dev
, merge branch).Fixes #132.
Fixes #133.