geobenchmark

Some benchmarks to compare the performance of some python libraries for a few specific use cases:

Vector ops: process large vector geo files, eg. calculate intersections,...
IO: performance to read/write vector geo files
Zonalstats: compare performance to calculate zonal stats

To run the benchmarks, you can:

clone this repository
use the environment.yml file to create a conda environment with the necessary dependencies like this: conda env create -f environment.yml
run one of the run_benchmarks_... .py files

Vector ops

The benchmarks included always follow the following usage scenario:

read data from a geopackage file
do one spatial operation on the data
write the result to a geopackage file

So, if you are looking for a library for another use case (eg. process many small files, do pure in-memory processing,...), use these benchmark results with a lot of caution! Especially the buffer benchmark is at the time of writing for 90% I/O bound in geopandas!

The test files used are not that large (eg. 500k polygons, 350 MB .gpkg), but try to strike a balance between being large enough to give an idea on processing time to expect versus the time spent waiting for a benchmark to be run. They also fit in the memory of most desktops, so this makes it possible to also benchmark libraries that don't support files too large to fit in memory. When using geofileops on really large .gpkg files (> 10 GB), obviously the speed improvements become (even) more important.

The benchmarks ran on a Windows workstation and the libraries were constrained to use a maximum of 12 logical cores.

The following chart shows the main results of this benchmark.

IO

Mainly a test of pyogrio performance

Zonalstats

Comparison of different libraries/SW to calculate zonal statistics for 5000 polygons.

Remark: pyjeo is not (yet) available via pip/conda to install, so to run the pyjeo benchmark, follow the installation instructions here.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
benchmarks_IO		benchmarks_IO
benchmarks_vector_ops		benchmarks_vector_ops
benchmarks_zonalstats		benchmarks_zonalstats
results_IO		results_IO
results_vector_ops		results_vector_ops
results_zonalstats		results_zonalstats
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
benchmarker.py		benchmarker.py
environment-vector-ops.yml		environment-vector-ops.yml
environment-zonalstats-qgis.yml		environment-zonalstats-qgis.yml
environment-zonalstats.yml		environment-zonalstats.yml
project.toml		project.toml
reporter.py		reporter.py
run_benchmarks_IO_all.py		run_benchmarks_IO_all.py
run_benchmarks_vector_ops.py		run_benchmarks_vector_ops.py
run_benchmarks_zonalstats.py		run_benchmarks_zonalstats.py
testdata.py		testdata.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

geobenchmark

Vector ops

IO

Zonalstats

About

Contributors 2

Languages

License

geofileops/geobenchmark

Folders and files

Latest commit

History

Repository files navigation

geobenchmark

Vector ops

IO

Zonalstats

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages