-
Notifications
You must be signed in to change notification settings - Fork 36
How to change a source, dataset, or version identifier
We highly recommend certain directory conventions to consistently organize data that are collected from others and converted locally. To reinforce the use of the directory conventions, the scripts in csv2rdf4lod-automation expect this structure when they execute. The convention organizes data according to the source organization, the dataset name, and the version of the dataset while naming the directories in the [data root](csv2rdf4lod automation data root) to correspond to the identifiers of the source, dataset, and version that are also encoded in RDF using conversion:source_identifier, conversion:dataset_identifier, and conversion:version_identifier. When the converter is invoked, the directory structure does not affect how URIs are created; instead, the values of the three conversion:*_identifier
properties are used to create the URIs. These three properties are encoded in the conversion parameters provided to the converter.
Although, following our naming recommendations can reduce the need to rename datasets, someone will inevitably want to change an identifier for a dataset already started.
We'll use an example from LOGD that moved data-worldbank-org/world-development-index/ to worldbank-org/world-development-indicators/. This will change both the source identifier and dataset identifier. We are changing the source identifier because data-
should not be part of the source identifier, and we are changing the dataset identifier because "index" should have been "indicators".
Be sure to move/rename via version control (if you are using it).
From the [data root](csv2rdf4lod automation data root) whatever-you-want/source/
,
svn mv data-worldbank-org/world-development-index worldbank-org/world-development-indicators
(Fortunately, if you're using Automated creation of a new Versioned Dataset, you can skip this step)
When the conversion trigger is created, it caches the source, dataset, and version identifiers as variables. These variables are then used to automatically generate the raw conversion parameters automatic/*.params.ttl
(every time) and the enhancement conversion parameters manual/*.params.ttl
only when there are no enhancement parameters present (see generating enhancement parameters).
If you don't update the conversion trigger, the raw conversion will continue to use the old identifiers (since the conversion parameters are recreated every time).
The following commands illustrate where the conversion trigger is within the directory convention and how it caches the corresponding source, dataset, and version identifiers. Notice how data-worldbank-org
appears in the directory path and in a variable of the conversion trigger. The same is true for the dataset identifier world-development-index
and the version identifier 2011-May-05
.
# See where the conversion trigger cached the source identifier
$ grep "data-worldbank-org" data-worldbank-org/world-development-index/version/*/convert-world-development-index.sh
sourceID="data-worldbank-org"
# See where the conversion trigger cached the dataset identifier
$ grep "world-development-index" data-worldbank-org/world-development-index/version/*/convert-world-development-index.sh
datasetID="world-development-index"
# See where the conversion trigger cached the version identifier
$ grep "2011-May-05" data-worldbank-org/world-development-index/version/*/convert-world-development-index.sh
versionID="2011-May-05"
# See where the global enhancement parameters specify the source identifier
$ grep "data-worldbank-org" data-worldbank-org/world-development-index/version/*.params.ttl
data-worldbank-org/world-development-index/version/WDI_GDF_Country.csv.e1.params.ttl: conversion:source_identifier "data-worldbank-org";
data-worldbank-org/world-development-index/version/WDI_GDF_CS_Notes.csv.e1.params.ttl: conversion:source_identifier "data-worldbank-org";
data-worldbank-org/world-development-index/version/WDI_GDF_Data.csv.e1.params.ttl: conversion:source_identifier "data-worldbank-org";
data-worldbank-org/world-development-index/version/WDI_GDF_Footnotes.csv.e1.params.ttl: conversion:source_identifier "data-worldbank-org";
data-worldbank-org/world-development-index/version/WDI_GDF_Series.csv.e1.params.ttl: conversion:source_identifier "data-worldbank-org";
# See where the global enhancement parameters specify the dataset identifier
$ grep "world-development-index" data-worldbank-org/world-development-index/version/*.params.ttl
data-worldbank-org/world-development-index/version/WDI_GDF_Country.csv.e1.params.ttl: conversion:dataset_identifier "world-development-index";
data-worldbank-org/world-development-index/version/WDI_GDF_CS_Notes.csv.e1.params.ttl: conversion:dataset_identifier "world-development-index";
data-worldbank-org/world-development-index/version/WDI_GDF_Data.csv.e1.params.ttl: conversion:dataset_identifier "world-development-index";
data-worldbank-org/world-development-index/version/WDI_GDF_Footnotes.csv.e1.params.ttl: conversion:dataset_identifier "world-development-index";
data-worldbank-org/world-development-index/version/WDI_GDF_Series.csv.e1.params.ttl: conversion:dataset_identifier "world-development-index";
# The version identifier specified in global enhancement parameters is irrelevant
# because it is replaced when creating a new version.