Skip to content

Module 2A, Loading different data sources

Jip Claassens edited this page Jul 3, 2024 · 22 revisions

learning objective: How to configure reading data from data sources in a GeoDMS project

sourceData

Configuring data sources in a SourceData section of your configuration is a good convention. This results in a natural division between the original data and the calculated data/results.

For many data formats, we use GDAL for reading; GDAL supports many formats; in this academy, we will only describe the most used formats.

different data types

A relevant distinction in the data used can be made between:

  • spatial data (data describing coordinates on the world or attributes directly related to these coordinates)
  • non-spatial data

We will start by describing how to load/read the non-spatial data.

non-spatial data

Probably the most used non-spatial data source in GeoDMS projects is the .csv file.

The GDAL StorageManager (a software component used to read/write data from a source) is used for many formats, including .csv files.

Assume we have a .csv file with three attributes: municipalityname, municipalitycode and nrinhabitants.

Configuring such a file looks as follows:

unit<uint32> municipality 
:   StorageName     = "%ProjDir%/Data/municipality.csv"`  
,   StorageType     = "gdal.vect"`  
,   StorageReadOnly = "True"`  
{  
    attribute<nr_inh> nr_inhabitants := nrinhabitants[nr_inh];
}
  • The StorageName property refers to the file read.
  • The StorageType property informs the GeoDMS we will use the gdal.vect (part of GDAL) StorageManager.
  • With the StorageReadOnly property is configured that the data is meant to be read-only, so only as a source to read data from.

By default all attributes are read from the .csv file with the default Value Type: string as values unit. For municipalityname and municipalitycode, this is fine.

For nrinhabitants a numeric values unit with metric is preferred, so we configure explicitly nr_inhabitants with as values unit nr_inh.

Other non-spatial data is read from other table formats like .dbf.xml or other ASCII formats like .asc or .txt.

spatial data

Spatial data is also read from files/databases. For spatial data, an important distinction is made between:

vector data

The essence of vector data is that world coordinates are configured in an attribute of a vector domain unit, for which other attributes are usually configured or related.

In the GeoDMS, we call the attributes with the coordinates the feature attribute.

Vector data is often read from ESRI Shapefiles. The following example shows how a point vector data is configured:

unit<uint32> PC4 
:  StorageName     = "%ProjDir%/Data/pc4.shp" 
,  StorageType     = "gdal.vect"
,  StorageReadOnly = "True"  
{
   attribute<rdc> geometry;
}

The StorageName, StorageType, and StorageReadOnly properties are configured similarly to the example with .csv data.

The feature attribute in this example is the configured attribute geometry. We advise them to name his attribute geometry. The GeoDMS is then informed that this attribute can be used to map the data.

It is also possible to use another name, but then the DialogData property must be configured to 'map' and the DialogType to the name you have chosen.

As a feature attribute contains coordinates (X and Y), the value type of the values unit need to be two-dimensional (PointGroup). In this example, the values unit is rdc, the default coordinate system of the Netherlands (RD New, EPSG:28992).

See How to configure a coordinate system for more information on coordinate system units.

In this example, the geometry contains points, one rdc coordinate for each entry in the domain unit. If your shapefiles contains arc or polygon data, which means it contains sequences of coordinates.

In the GeoDMS, the configuration is similar, with as only exception of how the feature attribute is configured:

for arc data:

attribute<rdc> geometry (arc);

and for polygon data:

attribute<rdc> geometry (poly);

Between the () brackets, the composition type needs to be configured.

How to configure other data types, follow the links below:

grid data

In vector data, the feature attribute contains coordinates; in Grid data, the domain unit is always two-dimensional. This means each element of an attribute of a grid domain describes a cell with a row and a column number.

These row and column numbers of a cell are based on the value type, in this case ipoint, usually starting with 0 and in increasing order and are called the local coordinates.

To use grid data in a GIS, we need to refer these local coordinates to locations on the Earth. Projection information is used for this purpose.

Most grid data in GeoDMS projects is read from (Geo)Tiff files. See the following examples:

unit<ipoint> bodemgebruik2015  
: StorageName     = "%ProjDir%/Data/bbg2015_100m_10k.tif"
, StorageType     = "gdal.grid"
, StorageReadOnly = "true" 
, DialogData      = "rdc_base"
{  
   attribute<uint8> GridData;
}

unit<ipoint> bodemgebruik2015  
: StorageName     = "%ProjDir%/Data/bbg2015_100m_10k.tif"
, StorageType     = "gdal.grid"
, StorageReadOnly = "true" 
, DialogData      = "rdc_base"
{  
   attribute<uint8> GridData;
}

The bodemgebruik2015 domain unit in both examples is a two-dimensional (ipoint) domain. The number of cells is derived from the bbg2015_100m_10k.tif file.

The StorageName and StorageReadOnly properties are configured in a similar manner as with the .csv data. The StorageType property is configured to a different StorageManager: gdal.grid.

The projection information, needed to know the location of the grid in the coordinate_system of your project, is for tiff read stored in the tiff file itself or in a separate .tfw file. The combination of the tiff with this projection information is also called a GeoTiff.

The DialogData = "rdc_base" configuration line is necessary to inform the GeoDMS on the coordinate system unit, in order to interpret the projection information.

The GridData attribute refers to the actual data in the tiff file. The difference between both examples is the configuration of the domain unit of this attribute:

  1. In the first example no domain unit is explicitly configured, the parent item bodemgebruik2015 is used as domain unit.. The number of cells is based on the .tiff file and the projection information is used to translate the local coordinates to the coordinate system of your project.
  2. In the second example, a domain unit is configured. A grid domain can be configured explicitly in the GeoDMS syntax, see e.g. Grid Domain>> explicit configuration with GeoDMS functions. If attributes are configured for the same domain, it is easier to combine them in calculations.

try it yourself!

Download the project here and unzip the downloaded project file to a projDir like C:/prj/GeoDMSAcadamy.

  • Open the file exercise.dms (in the GeoDMS_Academy\data_sources\cfg subfolder of your downloaded project) with a text editor.

In the data subfolder of this projDir (..\GeoDMS_Academy\data_sources\data) you will find:

  • a .csv file : gemeente.csv
  • three shape files: NS_Stations_2019_RD (points), OSM_Motorways_NL (arcs), CBS_COROP_2012 (polygons)
  • a GeoTiff: bbg2015_100m_10k.tif.

Configure all these files in the configuration file: exercise.dms (in the cfg subfolder of your downloaded project) and use the GeoDMS GUI to make a table of the csv data and maps for the vector and grid data.

tip: use the %ProjDir% placeholder in the StorageName property to refer to the files in the data subfolder.

First, try to figure it out yourself. The configuration with the results is also available in the downloaded project; open the result.dms file with the GeoDMS GUI in the cfg subfolder of your downloaded project and see container SourceData.


Go to previous module: Module 2, Loading and storing data sources

Go to next module: Module 2B, Storing different data sources

Clone this wiki locally