layout | title | subtitle | tags | cover-img | thumbnail-img | share-img | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
post |
noGDAL |
Geospatial Python without GDAL |
|
/assets/img/noGDAL.png |
/assets/img/noGDAL.png |
/assets/img/noGDAL.png |
NoGDAL is a simple philosophy. To write a noGDAL program means that the program does 'not rely upon GDAL as a dependency'.
When programming a geospatial application with python, we often need to read and write raster files, convert coordinate systems or do some processing of geospatial data. When faced with these problems the solution, more often than not, involves GDAL or a GDAL based python library. GDAL seems to be a staple in the world of geospatial Python and I refuse to use it!
You may get lucky, but generally, when installing GDAL onto your system it’s not as simple as spinning up your favourite package manager and having it work.
There are usually many gotchas along the way. In fact, the GDAL documentation has an entire page dedicated to Python Gotchas in the GDAL and OGR Python Bindings. Some of the main gotchas include:
- Python bindings do not raise exceptions unless you explicitly call UseExceptions()
- Python crashes if you use an object after deleting an object it has a relationship with
- Python crashes in GDAL functions when you upgrade or downgrade numpy
This is a bad sign for users who want to get a quick solution without having to wade through documentation and learn a new programming style.
The core of these problems is that GDAL was not written for or in python. The GDAL Python package is just a wrapper around the C++ GDAL library. Therefore, along with the installation nightmare, there are usage nightmares. As stated by the rasterio documentation:
” Python programs using them tend to read and run like C programs. For example, GDAL’s Python bindings require users to watch out for dangling C pointers, potential crashers of programs. This is bad”
Often when working on a product, we find ourselves having to use virtual environments, Docker containers or virtual environments inside docker containers to make sure that the various solutions to different problems don't conflict
There have been attempts to abstract the GDAL problems away from the user so coding may be a bit more pythonic; the most commonly used is the rasterio package. This is a big step in the right direction, but still has issues due to its underlying use of GDAL. Let’s take in this lovely statement from the rasterio docs:
” Rasterio 1.0.x works with Python versions 2.7.x and 3.5.0 through 3.7.x, and GDAL versions 1.11.x through 2.4.x. Official binary packages for Linux and Mac OS X are available on PyPI. Unofficial binary packages for Windows are available through other channels. Rasterio 1.0.x is not compatible with GDAL versions 3.0.0 or greater.”
That’s a bummer for users of the latest python versions and a ‘too bad so sad’ for windows users.
Simple, don’t use GDAL. By not using GDAL and using pure python, I am able to write code without having to worry about differences in various system installations, numpy versions, syntax gotchas and compatibility issues.
In my experience, the following are some honourable mentions when choosing the ingredients for cooking a geospatial pie:
- pyshp: The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python.
- tifffile: To read and write tiff files to are from numpy arrays
- PyKrige: For 2D and 3D ordinary and universal kriging.
- csaps: For univariate, multivariate and n-dimensional grid data approximation using cubic smoothing splines.
- pillow: The Python Imaging Library adds image processing capabilities to your Python interpreter.
- geojson: With Functions for encoding and decoding GeoJSON formatted data and Classes for all GeoJSON Objects
- shapely: Manipulation and analysis of geometric objects in the Cartesian plane.
- pyproj: Cartographic projections and coordinate transformations library
Obviously, I’m being very opinionated here and my ideas won't be for everyone; especially for those who have mastered the GDAL learning curve. These are some of the reasons I like the noGDAL philosophy:
- I like to write libraries and code for others to use (including future me)
- I’d like the user to have an easy installation process
- I like pure python and pythonicness
- I like being able to install with only pip
Often, I can get started with using a package in minutes. It’s as simple as:
pip install package
And in code:
import package
package.Object().do_somthing()
Then move onto writing creative code. I don’t want to spend hours in the documentation for something that should be quick and simple; and I don’t wish this upon others either.
I’m working on a python noGDAL open-source project called geotiff. The aims of this project are to be able to:
- read tiff files (including BigTiff)
- write tiff files (including BigTiff)
- convert between coordinate systems
- cut a section (bounding box) of the tiff file
- convert the data to numpy arrays
I’m hoping that this could be a collaborative effort (as I, like many, don't have all the best answers). If you would like to get involved in this project, please go to the geotiff repo give it a star and introduce yourself in the Discussions tab.