-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Availability of Spatial Indexes #17
Comments
Another theory I have is the spatial join is keeping copies of both the A and B geoms. When the B geom is large, it's making many redundant unnecessary copies of it. Setting keep=F doesn't alter the behavior E.g. New experiment, rewriting the sql to remove the geoms in the first select as opposed to after the fact does speed up things but not more than 5%. Most of the overhead is still somewhere else.
|
My last experiment improved performance into ranges I was expecting, the right hand side polygons are very large and detailed (national boundaries). Hitting them with Setting the simplification threshold to 0.1 or 0.01 looked visually ok but threw "com.vividsolutions.jts.geom.TopologyException: side location conflict" errors when used in the join. I hope this is useful to others, and that there are other performance tuning steps I missed that people can recommend. Even with the simplified polygons, my left join of 480k points within 198 possible country polygons looks like it's going to take 6 hours on a 48 thread machine. |
@rexdouglass it seems you do not turn on the spatial index function. notice: the bigger table should be in the first place, and the small one follows. this function works well on our production environment like this:
here is another tip: ST_Intersects is better than ST_Within and ST_Contains when you do not clearly know the order between two geometries. |
I'm experimenting with geospark and find the spatial joins slower than expected.
I've set geospark.join.gridtype to "kdbtree" in my configuration below.
Is there something else I need to do to enable or use spatial indexes when creating, saving, or joining on parquets with a geom column?
The text was updated successfully, but these errors were encountered: