-
Notifications
You must be signed in to change notification settings - Fork 36
conversion:interpret
- conversion:interpret is one of many conversion:Enhancements.
conversion:interpret can be used to override input values with predetermined replacements. This can help avoid the need to modify the original input file to prepare for conversion. Avoiding such a step helps maintain the fidelity of the original source.
conversion:interpret can be used to:
- Decipher codes
- Ignoring certain values globally or for certain columns
- Cleaning up values
Often, codes are used to abbreviate longer, more meaningful, values. For example, one dataset uses "P", "S", and "H" to stand for "President", "Senate", and "House". Though this might be useful for those that intimately know the dataset, it makes it more difficult for for the rest of the world. The following enhancements can make these myopic identifiers understandable by a much larger world-wide community:
conversion:enhance [
ov:csvCol 1;
conversion:interpret [
conversion:symbol "S";
conversion:interpretation <http://dbpedia.org/resource/United_States_Senate>;
];
conversion:interpret [
conversion:symbol "H";
conversion:interpretation <http://dbpedia.org/resource/United_States_House_of_Representatives>;
];
conversion:interpret [
conversion:symbol "P";
conversion:interpretation <http://dbpedia.org/resource/President_of_the_United_States>;
];
];
The most popular use for this enhancement is to omit empty string values for a cell of a table. They are left there in the naive interpretation to be as faithful to the original data as possible, but they often mean nothing and clutter things up.
Certain values are used to express that there is no value for a relationship. These can be ignored by setting the "interpret as null" enhancement parameter, so that the null values do not interfere with the actual values. Triples are not asserted for values that should be interpreted as null. The null value can be interpreted for all columns or for a specific column.
Note, this structure is also used in #Codebook Resource Promotion parameter, but is used by an enhancement not by the conversion process.
e.g., Dataset 1530
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1530";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
conversion:interpret [
conversion:symbol "-", "- ";
conversion:interpretation conversion:null;
];
];
.
@prefix raw: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/> .
ds1530:thing_1 raw:organization "-" .
ds1530:thing_2538 raw:closed_date "- " .
becomes
''no triple asserted''
Other datasets that benefit from this enhancement include Health Information National Trends Survey 2005 ("#NULL!"), Dataset 10030 (" - "), Dataset 1330 ("?? Total").
An interesting extension to this enhancement would be to add a pattern for what to interpret as null.
The above example showed how to interpret a symbol as null for all columns. This behavior can be set for a specific column by moving the interpretation to a single enhancement.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1530";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
conversion:enhancement [
ov:csvCol 1;
conversion:interpret [
conversion:symbol "-", "- ";
conversion:interpretation conversion:null;
];
];
];
.
Other datasets that benefit from this enhancement includes Dataset 1491.
A not-so elegant use of this enhancement is to tweak a handful of values into other values.
For example, NITRD added some footnotes when mentioning some agencies. We can tidy them up. This obviously only makes sense for a small number of values, and begs for a more generic value-tweaking mechanism (we just haven't seen enough need for it, yet).
conversion:enhance [
ov:csvCol 1;
conversion:interpret [
conversion:symbol "NIH 2";
conversion:interpretation "NIH";
];
conversion:interpret [
conversion:symbol "DOE 2";
conversion:interpretation "DOE";
];
The "doi:" prefix can be removed when processing the following input with conversion:interpret.
"David Tilman","doi:10.6073/AA/knb-lter-cdr.157002.122"
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "doi";
conversion:equivalent_property bibo:doi;
conversion:interpret [
conversion:regex "^doi:";
conversion:object "";
];
conversion:range rdfs:Literal;
(see also conversion:object_search
Although this output might look like a bug with conversion:symbol and conversion:interpretation, it is actually intended:
typed_agency:NIH
dcterms:identifier "NIH 2" ;
a federal_research_and_development_budget_for_networking_and_information_technology_vocab:Agency ;
conversion:symbol "NIH 2" ;
conversion:interpretation "NIH" ;
- Subsequent enhancement parameters can point to this dataset and get the symbol/interpretation pairings.
- Subsequent enhancement parameters can also point to this dataset to get dcterms:identifiers during ObjectSameAsLinking.
$CSV2RDF4LOD_HOME/bin/util/symbol-interpretation.awk
See which input values are interpreted differently according to enhancement parameters (results):
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
SELECT distinct *
WHERE {
graph <http://purl.org/twc/vocab/conversion/ConversionProcess> {
?layer
conversion:conversion_process [
conversion:interpret [
conversion:symbol ?symbol;
conversion:interpretation ?interp
];
]
.
}
}
For some more details on implementation and utilities, see Codebook enhancements.
- $CSV2RDF4LOD_HOMEbin/util/distinct-values-2-symbol-interps.pl to query a SPARQL endpoint for distinct values of a predicate and produce an eparams template with symbol/interpretations.
- $CSV2RDF4LOD_HOME/bin/util/symbol-interpretation.awk to accept
416,"Wage Stabilization Board"
and output symbol/interpretation eparams.