Exploring spatially-partitioned GeoParquet

Install

Using Poetry:

poetry install

This will install packages from the lockfile and ensure that you're using the exact same environment of packages as me.

Spatially partitioning the data

Preprocessing

Step 1: Download data

Download source files from Microsoft's website:

> mkdir -p data/source/
# With 8 threads:
> cat files.txt | xargs -n 1 -P 8 wget -q -P data/source/

Step 2: Preprocess data

The data are distributed by Microsoft in zipped GeoJSON, which is not a performant format to load. To make later steps faster, we'll convert all the input files into Parquet. This uses the osgeo/gdal:latest image (as of May 7, 2022) for simplicity.

cd data
mkdir -p preprocessed
for file in $(ls source/*.zip); do
  state=$(basename $file .geojson.zip)
  echo $state
  docker run --rm -it -v $(pwd):/data osgeo/gdal:latest \
    ogr2ogr \
    /data/preprocessed/$state.parquet \
    /vsizip//data/$file \
    -lco COMPRESSION=ZSTD
done
cd ..

Takes ~1 hour on my computer.

Results

poetry run python visualize_bounds.py --input data/manual-hilbert-shuffle/shuffled.parquet

Hilbert values with 2000 equal partitions

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
img		img
manual-hilbert-shuffle		manual-hilbert-shuffle
naive-dask-geopandas		naive-dask-geopandas
pyflatbush		pyflatbush
rstar		rstar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
files.txt		files.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
visualize_bounds.py		visualize_bounds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring spatially-partitioned GeoParquet

Install

Spatially partitioning the data

Preprocessing

Step 1: Download data

Step 2: Preprocess data

Results

About

Releases

Packages

Languages

License

kylebarron/spatially-partitioned-geoparquet

Folders and files

Latest commit

History

Repository files navigation

Exploring spatially-partitioned GeoParquet

Install

Spatially partitioning the data

Preprocessing

Step 1: Download data

Step 2: Preprocess data

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages