Read version number from the schema #159

tschaub · 2022-12-07T19:59:35Z

The goal of this branch is to reduce the number of places the version number is repeated. Here is a summary of the changes and motivation for each:

The examples/example.py script currently includes the version number. I changed this to read from format-specs/schema.json instead. The examples directory has an environment.yml file that apparently allows examples/example.py to be run, but because there was no readme, I moved the script to the scripts directory where there are instructions on how to install python dependencies.
The tests/test_json_schema.py script currently includes the version number. As above, I changed this to read from format-specs/schema.json instead. The script also has dependencies and no instructions on how to install them, so I moved it to the scripts directory where this is documented and updated the dependency list there.
I updated the .github/workflows/scripts.yml workflow to run the script that generates the example data before asserting that the example metadata matches expectations. This should catch cases where someone updates the expected metadata but not the example or vice versa (if someone forgets to change both but changes the metadata schema, ideally the validator job should catch that).
I removed one additional place where the version identifier was repeated in the spec.

kylebarron · 2022-12-07T20:29:49Z

Looks like #24 was what added environment.yml (cc @TomAugspurger). I personally prefer poetry and pip/PyPI for packages, but maybe on Windows Conda is still easier.

tschaub · 2022-12-07T21:00:25Z

One issue with the script that generates the example.parquet is that it depends on whatever geopandas.datasets.get_path("naturalearth_lowres") resolves to. I've seen that the resulting parquet dataset can vary. For example, the "crs" data may have a "$schema": "https://proj.org/schemas/v0.5/projjson.schema.json" or "$schema": "https://proj.org/schemas/v0.4/projjson.schema.json". I wonder if this depends on the specific version of geopandas. If so, can Poetry help us get something more repeatable here?

tschaub · 2022-12-07T21:12:19Z

It looks like the example.parquet file differs, but that the geo metadata is the same. Perhaps we don't expect a bitwise match for parquet files generated on different systems.

kylebarron · 2022-12-07T23:18:16Z

Perhaps we don't expect a bitwise match for parquet files generated on different systems.

Parquet writers usually include the name of the implementation in the metadata, so it won't be bitwise equal across implementations

import pandas as pd
import pyarrow.parquet as pq
df = pd.DataFrame({'a': [1, 2, 3, 4]})
df.to_parquet('tmp.parquet')
pq.read_metadata('tmp.parquet').created_by
# 'parquet-cpp-arrow version 10.0.1'

jorisvandenbossche · 2022-12-08T14:07:21Z

One issue with the script that generates the example.parquet is that it depends on whatever geopandas.datasets.get_path("naturalearth_lowres") resolves t

It's probably good to rely on this for the example data. For one, we don't guarantee it to be stable as we might update it with new naturalearth data releases (which is maybe not a problem for our use case), but the dataset also has some political problems (like Crimea belonging to Russia, which is a reason we would prefer to remove it from geopandas on the long term).

Maybe we can find some other online data source to use as example?

jorisvandenbossche · 2022-12-08T14:12:49Z

I personally prefer conda, and would rather not use poetry. But I also don't think it is worth supporting both since in the end it is just for running this script, which is not something one will have to do often yourself. So I am OK with continue to use poetry here.

But we should maybe avoid relying on GDAL (fiona) then to read in the example data to keep the installation of the env for the scripts simpler (eg if we read from a geojson file, GDAL is not necessarily needed).

tschaub · 2022-12-08T15:29:22Z

But we should maybe avoid relying on GDAL (fiona) then to read in the example data to keep the installation of the env for the scripts simpler (eg if we read from a geojson file, GDAL is not necessarily needed).

I agree that we could reduce dependencies and simplify creation of the example file. And I agree that something besides political boundaries would be good. Let's take this up separately from the changes here.

scripts/pyproject.toml

tschaub · 2022-12-13T21:52:09Z

@kylebarron or @jorisvandenbossche - Any more comments on these changes? We definitely have more to discuss and probably do related to CI, dependencies, example data, etc. But I'm wondering if we can get in the changes that reduce the duplicated version identifiers to simplify the release process (#146).

I updated the description of the PR to include more details on the changes and their motivation in case that helps.

jorisvandenbossche

Yes, all my suggestions about a different data set / not using GDAL were just ideas that can certainly be left for other issues/PRs

tschaub force-pushed the versions branch from ce84ec1 to d36aa43 Compare December 7, 2022 21:03

tschaub force-pushed the versions branch 2 times, most recently from 8782825 to d73fa35 Compare December 7, 2022 22:30

tschaub added 4 commits December 7, 2022 15:30

Read version number from the schema

30c8ec2

Update lockfile after poetry update

803c11e

Generate example before extracting schema

2d915f6

Allow for differences in the example.parquet file

d73fa35

tschaub force-pushed the versions branch from 3217216 to 4e0bec1 Compare December 8, 2022 15:28

This was referenced Dec 8, 2022

Clean-up README #156

Merged

Add basic valid and invalid tests for the json schema #141

Merged

kylebarron reviewed Dec 8, 2022

View reviewed changes

scripts/pyproject.toml Show resolved Hide resolved

kylebarron reviewed Dec 8, 2022

View reviewed changes

scripts/pyproject.toml Outdated Show resolved Hide resolved

tschaub force-pushed the versions branch from 4e0bec1 to 53abaa1 Compare December 8, 2022 15:53

Move test script to test directory

53abaa1

tschaub mentioned this pull request Dec 13, 2022

Propose and document release process #146

Closed

jorisvandenbossche approved these changes Dec 14, 2022

View reviewed changes

tschaub merged commit 592fb0a into opengeospatial:main Dec 14, 2022

tschaub deleted the versions branch December 14, 2022 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read version number from the schema #159

Read version number from the schema #159

tschaub commented Dec 7, 2022 •

edited

Loading

kylebarron commented Dec 7, 2022

tschaub commented Dec 7, 2022

tschaub commented Dec 7, 2022

kylebarron commented Dec 7, 2022

jorisvandenbossche commented Dec 8, 2022

jorisvandenbossche commented Dec 8, 2022

tschaub commented Dec 8, 2022

tschaub commented Dec 13, 2022 •

edited

Loading

jorisvandenbossche left a comment

Read version number from the schema #159

Read version number from the schema #159

Conversation

tschaub commented Dec 7, 2022 • edited Loading

kylebarron commented Dec 7, 2022

tschaub commented Dec 7, 2022

tschaub commented Dec 7, 2022

kylebarron commented Dec 7, 2022

jorisvandenbossche commented Dec 8, 2022

jorisvandenbossche commented Dec 8, 2022

tschaub commented Dec 8, 2022

tschaub commented Dec 13, 2022 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

tschaub commented Dec 7, 2022 •

edited

Loading

tschaub commented Dec 13, 2022 •

edited

Loading