-
Now that Overture is publishing GeoParquet, we should start thinking about how to sort or order the data within Parquet datasets in order to accelerate spatial queries. There's a lot of recent interest in this work and it feels somewhat new. Some applicable background:
Let's use this discussion to demonstrate some progress, brainstorm ideas/implementations, etc. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
@jwass thanks for kicking off this discussion. I'm really keen to see partitioning by country_iso for buildings and places. I think it would really help with developer advocacy and the general experience for Overture users who just want to get their hands on the data. I think what Chris has done so far with releasing the buildings is a fine example of what could be doing. I also like the idea of using bbox per row and hope that does get adopted as part of the geoparquet standard. Maybe we could start with buildings and see if we can add a coutry_iso per row there to use as the partition? In terms of the outputted geoparquet data from Overture, is that currently created using Apache Sedona? |
Beta Was this translation helpful? Give feedback.
-
Also came across https://github.com/kylebarron/spatially-partitioned-geoparquet as an additional reference. |
Beta Was this translation helpful? Give feedback.
-
See @marklit's detailed comment #113 (comment) |
Beta Was this translation helpful? Give feedback.
-
I made some significant improvements to the spatial partitioning in our release data. Here are two images showing the buildings data for January vs. what we released this morning. On the left of each image is the bounding box of each file. On the right is the bounding box of each row group This is a basic range partitioning of the geohash. Each file is then sorted by geohash 15. I think there's still plenty of room for improvement but this is a basic approach that should buy significant improvements in query performance. |
Beta Was this translation helpful? Give feedback.
@saosebastiao Yup - this is in the March release that was out yesterday.