Library for handling diffs for geospatial data. Works with GeoPackage files and PostGIS databases (as well as with non-spatial SQLite and PostgreSQL databases).
Geodiff library is used by Mergin - a platform for easy sharing of spatial data.
The first use case for geodiff library is to take two datasets with the same structure of tables and compare them - the comparison will create a "diff" file containing entries that were inserted/updated/deleted between the two datasets. A diff file can be applied to an existing dataset (e.g. a GeoPackage) and the dataset will be updated accordingly by applying the differences one by one. If you are familiar with diff
and patch
tools from UNIX world, this is meant to be an equivalent for spatial data.
The next use case is to merge changes from different copies of the same dataset that have been modified independently. Generally such changes cannot be applied cleanly. For example, if multiple users changed the same row of a table, or added a new row with the same ID. The library has functionality to "rebase" a diff file on top of another diff file, resolving any conflicts, so that all the changes can be applied cleanly. There still may be conflicts that can't be resolved fully automatically (e.g. if the same value is modified in different copies), these are written to a separate conflict file that can be addressed later (such changes are typically rare).
It is possible to apply diffs across different databases supported by geodiff drivers (nowadays supporting SQLite/GeoPackage and PostgreSQL/PostGIS). That means one can seamlessly find out difference between tables of two schemas in a PostGIS database, and apply the changes to a GeoPackage (or vice versa). Thanks to that, it is possible to keep data in sync even if the backends are completely different.
There are multiple ways how geodiff can be used:
geodiff
command line interface (CLI) toolpygeodiff
Python modulegeodiff
library using C API
The library nowadays comes with support for two drivers:
- SQLite / GeoPackage - always available
- PostgreSQL / PostGIS - optional, needs to be compiled
To get changes between two GeoPackage files and write them to a-to-b.diff
(a binary diff file):
geodiff diff data-a.gpkg data-b.gpkg a-to-b.diff
To print changes between two GeoPackage files to the standard output:
geodiff diff --json data-a.gpkg data-b.gpkg
To apply changes from a-to-b.diff
to data-a.gpkg
:
geodiff apply data-a.gpkg a-to-b.diff
To invert a diff file a-to-b.diff
and revert data-a.gpkg
to the original content:
geodiff invert a-to-b.diff b-to-a.diff
geodiff apply data-a.gpkg b-to-a.diff
The geodiff
tool supports other various commands, use geodiff help
for the full list.
Install the module from pip:
pip3 install pygeodiff
If you get error ModuleNotFoundError: No module named 'skbuild'
try to update pip with command
python -m pip install --upgrade pip
Sample usage of the Python module:
import pygeodiff
geodiff = pygeodiff.GeoDiff()
# create a diff between two GeoPackage files
geodiff.create_changeset('data-a.gpkg', 'data-b.gpkg', 'a-to-b.diff')
# apply changes from a-to-b.diff to the GeoPackage file data-a.gpkg
geodiff.apply_changeset('data-a.gpkg, 'a-to-b.diff')
# export changes from the binary diff format to JSON
geodiff.list_changes('a-to-b.diff', 'a-to-b.json')
If there are any problems, calls will raise pygeodiff.GeoDiffLibError
exception.
See geodiff.h header file for the list of API calls and their documentation.
Output messages can be adjusted by GEODIFF_LOGGER_LEVEL environment variable.
Changes between datasets are read from and written to a binary changeset format.
Install postgresql client and sqlite3 library, e.g. for Linux
sudo apt-get install libsqlite3-dev libpq-dev
or MacOS (using SQLite from QGIS deps) by defining SQLite variables in a cmake configuration as following:
SQLite3_INCLUDE_DIR=/opt/QGIS/qgis-deps-${QGIS_DEPS_VERSION}/stage/include
SQLite3_LIBRARY=/opt/QGIS/qgis-deps-${QGIS_DEPS_VERSION}/stage/lib/libsqlite3.dylib
Compile geodiff:
mkdir build
cd build
cmake .. -DWITH_POSTGRESQL=TRUE
make
C++ tests: run make test
or ctest
to run all tests. Alternatively run just a single test, e.g. ./tests/geodiff_changeset_reader_test
Python tests: you need to setup GEODIFFLIB with path to .so/.dylib from build step
GEODIFFLIB=`pwd`/../build/libgeodiff.dylib nose2
- run
python3 ./scripts/update_version.py --version x.y.z
- push to GitHub
- tag the master & create github release - Python wheels will be automatically published to PyPI!
Library uses its own copy of