Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read version number from the schema #159

Merged
merged 5 commits into from
Dec 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions .github/workflows/scripts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,6 @@ jobs:
geoparquet_validator $example || exit 1;
done

- name: Test json schema
run: |
python -m pip install pytest
cd tests
pytest test_json_schema.py -v

test-json-metadata:
runs-on: ubuntu-latest
steps:
Expand All @@ -56,8 +50,12 @@ jobs:
- name: Run scripts
run: |
cd scripts
poetry run pytest test_json_schema.py -v
poetry run python generate_example.py
poetry run python update_example_schemas.py
cd ../examples
# Assert no changes in the git repo, aka that the json version of the
# schemas are up to date
# Assert that the version number and file metadata are up to date
# Allow for differences in example.parquet
git restore example.parquet
git diff
test -z "$(git status --porcelain)"
5 changes: 2 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# Ignore GeoPackage file used in conversion to GeoParquet
*.gpkg*
tests/data/*
/scripts/data/
/scripts/__pycache__/
8 changes: 0 additions & 8 deletions examples/environment.yml

This file was deleted.

Binary file modified examples/example.parquet
Binary file not shown.
15 changes: 2 additions & 13 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,23 +41,12 @@ All file-level metadata should be included under the `geo` key in the parquet me

| Field Name | Type | Description |
| ------------------ | ------ | -------------------------------------------------------------------- |
| version | string | **REQUIRED.** The version of the GeoParquet metadata standard used when writing. |
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. |
| version | string | **REQUIRED.** The version identifier for the GeoParquet specification. |
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations. |
| columns | object\<string, [Column Metadata](#column-metadata)> | **REQUIRED.** Metadata about geometry columns. Each key is the name of a geometry column in the table. |

At this level, additional implementation-specific fields (e.g. library name) are allowed, and thus readers should be robust in ignoring those.

### Additional file metadata information

#### primary_column

This indicates the "primary" or "active" geometry for systems that can store multiple geometries,
but have a default geometry used for geospatial operations.

#### version

Version of the GeoParquet spec used, currently 0.5.0-dev

### Column metadata

Each geometry column in the dataset must be included in the columns field above with the following content, keyed by the column name:
Expand Down
20 changes: 19 additions & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,29 @@ poetry update
To run a script, prefix it with `poetry run`. For example:

```
poetry run python update_example_schemas.py
poetry run python generate_example.py
```

Using `poetry run` ensures that you're running the python script using _this_ local environment, not your global environment.

### Tests

To run the tests, change into the `scripts` directory and run the following:

```
poetry run pytest test_json_schema.py -v
```

### example.parquet

The `example.parquet` file in the `examples` directory is generated with the `generate_example.py` script. This script needs to be updated and run any time there are changes to the "geo" file metadata or to the version constant in `schema.json`.

To update the `../examples/example.parquet` file, run this from the `scripts` directory:

```
poetry run python generate_example.py
```

### nz-building-outlines to Parquet

```bash
Expand Down
11 changes: 9 additions & 2 deletions examples/example.py → scripts/generate_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,15 @@
table = pa.Table.from_pandas(df.head().to_wkb())


def get_version() -> str:
"""Read the version const from the schema.json file"""
with open(HERE / "../format-specs/schema.json") as f:
spec_schema = json.load(f)
return spec_schema["properties"]["version"]["const"]


metadata = {
"version": "0.5.0-dev",
"version": get_version(),
"primary_column": "geometry",
"columns": {
"geometry": {
Expand All @@ -42,4 +49,4 @@
)
table = table.cast(schema)

pq.write_table(table, HERE / "example.parquet")
pq.write_table(table, HERE / "../examples/example.parquet")
Loading