Skip to content

Commit

Permalink
Change to covering from geometry_bbox. Doc updates.
Browse files Browse the repository at this point in the history
Change the geometry_bbox to the broader "covering" section. Update
tests and examples.

Made some documentation updates:
* Parquet schema -> group
* Do not require zmin/zmax if geometries have 3 dimensions
  • Loading branch information
jwass committed Nov 28, 2023
1 parent e74af7d commit 8790bb4
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 19 deletions.
Binary file modified examples/example.parquet
Binary file not shown.
8 changes: 5 additions & 3 deletions examples/example_metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
180.0,
83.6451
],
"covering": {
"box": {
"column": "bbox"
}
},
"crs": {
"$schema": "https://proj.org/schemas/v0.6/projjson.schema.json",
"area": "World.",
Expand Down Expand Up @@ -108,9 +113,6 @@
},
"edges": "planar",
"encoding": "WKB",
"geometry_bbox": {
"column": "bbox"
},
"geometry_types": [
"Polygon",
"MultiPolygon"
Expand Down
21 changes: 16 additions & 5 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Each geometry column in the dataset MUST be included in the `columns` field abov
| edges | string | Name of the coordinate system for the edges. Must be one of `"planar"` or `"spherical"`. The default value is `"planar"`. |
| bbox | \[number] | Bounding Box of the geometries in the file, formatted according to [RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5). |
| epoch | number | Coordinate epoch in case of a dynamic CRS, expressed as a decimal year. |
| geometry_bbox | object | Object specifying a column name of a [Bounding Box Column](#bounding-box-columns). |
| covering | object | Object containing information like bounding boxes to help accelerate spatial data retrieval |


#### crs
Expand Down Expand Up @@ -136,17 +136,28 @@ For non-geographic coordinate reference systems, the items in the bbox are minim

The bbox values are in the same coordinate reference system as the geometry.

#### geometry_bbox
#### covering

The covering field specifies optional simplified representations of each geometry. The keys of the "covering" object MUST be a supported encoding. Currently the only supported encoding is "box" which specifies the name of a [bounding box column](#bounding-box-columns)

Example:
```
"covering": {
"box": {"column": "bbox"}
}
```

##### box encoding

Including a per-row bounding box can be useful for accelerating spatial queries by allowing consumers to inspect row group bounding box summary statistics. Furthermore a bounding box may be used to avoid complex spatial operations by first checking for bounding box overlaps. This field captures the name of a column containing the bounding box of the geometry for every row.

The format of `geometry_bbox` is `{"name": "column_name"}` where `column_name` MUST exist in the Parquet file and meet the criteria in the [Bounding Box Column](#bounding-box-columns) definition.
The format of `box` encoding is `{"name": "column_name"}` where `column_name` MUST exist in the Parquet file and meet the criteria in the [Bounding Box Column](#bounding-box-columns) definition.

Note: the value specified in this field should not be confused with the [`bbox`](#bbox) field which contains the single bounding box of this geometry over the whole GeoParquet file.
Note: the value specified in this field should not be confused with the top-level [`bbox`](#bbox) field which contains the single bounding box of this geometry over the whole GeoParquet file.

### Bounding Box Columns

A bounding box column MUST be a Parquet struct with required fields `xmin`, `xmax`, `ymin`, and `ymax`. For three dimensions the additional fields `zmin` and `zmax` MUST be present. The fields MUST be of Parquet type `FLOAT` or `DOUBLE`. The repetition of a bounding box column MUST match the geometry column's [repetition](#repetition). A row MUST contain a bounding box value if and only if the row contains a geometry value. In cases where the geometry is optional and a row not contain a geometry value, the row MUST NOT contain a bounding box value.
A bounding box column MUST be a Parquet group field with 4 child fields named `xmin`, `xmax`, `ymin`, and `ymax`. For three dimensions the additional fields `zmin` and `zmax` MAY be present but are not required. The fields MUST be of Parquet type `FLOAT` or `DOUBLE`. The repetition of a bounding box column MUST match the geometry column's [repetition](#repetition). A row MUST contain a bounding box value if and only if the row contains a geometry value. In cases where the geometry is optional and a row not contain a geometry value, the row MUST NOT contain a bounding box value.

The bounding box column MUST be at the root of the schema. The bounding box column MUST NOT be nested in a group.

Expand Down
16 changes: 11 additions & 5 deletions format-specs/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,19 @@
"epoch": {
"type": "number"
},
"geometry_bbox": {
"covering": {
"type": "object",
"required": ["column"],
"minProperties": 1,
"properties": {
"column": {
"type": "string",
"minLength": 1
"box": {
"type": "object",
"required": ["column"],
"properties": {
"column": {
"type": "string",
"minLength": 1
}
}
}
}
}
Expand Down
4 changes: 3 additions & 1 deletion scripts/generate_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ def get_version() -> str:
"crs": json.loads(df.crs.to_json()),
"edges": "planar",
"bbox": [round(x, 4) for x in df.total_bounds],
"geometry_bbox": {"column": "bbox"},
"covering": {
"box": {"column": "bbox"},
},
},
},
}
Expand Down
11 changes: 6 additions & 5 deletions scripts/test_json_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,10 @@ def get_version() -> str:
"geometry": {
"encoding": "WKB",
"geometry_types": [],
"geometry_bbox": {
"column": "bbox",
"covering": {
"box": {
"column": "bbox",
},
},
},
},
Expand Down Expand Up @@ -216,16 +218,15 @@ def get_version() -> str:
# Geometry Bbox

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["geometry_bbox"].pop("column")
metadata["columns"]["geometry"]["covering"].pop("box")
invalid_cases["empty_geometry_bbox"] = metadata


metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["geometry_bbox"]["column"] = ""
metadata["columns"]["geometry"]["covering"]["box"]["column"] = ""
invalid_cases["empty_geometry_bbox_column"] = metadata



# # Tests

@pytest.mark.parametrize(
Expand Down

0 comments on commit 8790bb4

Please sign in to comment.