Skip to content

conversion:interpret

timrdf edited this page Apr 4, 2011 · 50 revisions

See conversion:Enhancement.

conversion:interpret can be used to override input values with predetermined replacements, which has a variety of uses:

  • Deciphering codes
  • Ignoring certain values
  • "cleaning up" values

Deciphering codes

Some dataset uses "P", "S", and "H" to stand for "President", "Senate", and "House". (TODO: find the dataset again)

conversion:interpret [
  ov:csvCol 1;
  conversion:symbol         "S";
  conversion:interpretation <http://dbpedia.org/resource/United_States_Senate>;
]
conversion:interpret [
  ov:csvCol 1;
  conversion:symbol         "H";
  conversion:interpretation <http://dbpedia.org/resource/United_States_House_of_Representatives>;
]
conversion:interpret [
  ov:csvCol 1;
  conversion:symbol         "P";
  conversion:interpretation <http://en.wikipedia.org/wiki/President_of_the_United_States>;
]

Interpret as null (in all columns)

The most popular use for this enhancement is to omit empty string values for a cell of a table. They are left there in the naive interpretation to be as faithful to the original data as possible, but they often mean nothing and clutter things up.

Certain values are used to express that there is no value for a relationship. These can be ignored by setting the "interpret as null" enhancement parameter, so that the null values do not interfere with the actual values. Triples are not asserted for values that should be interpreted as null. The null value can be interpreted for all columns or for a specific column.

Note, this structure is also used in #Codebook Resource Promotion parameter, but is used by an enhancement not by the conversion process.

e.g., Dataset 1530

:dataset a void:Dataset;
   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "data-gov";
   conversion:dataset_identifier "1530";
   conversion:dataset_version    "2009-May-18";
   conversion:conversion_process [
      conversion:interpret [
         conversion:symbol "-", "- ";
         conversion:interpretation conversion:null;
      ];
   ];
.

@prefix raw: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/> .

ds1530:thing_1    raw:organization "-" .
ds1530:thing_2538 raw:closed_date  "- " .

becomes

''no triple asserted''

Other datasets that benefit from this enhancement include Health Information National Trends Survey 2005 ("#NULL!"), Dataset 10030 (" - "), Dataset 1330 ("?? Total").

An interesting extension to this enhancement would be to add a pattern for what to interpret as null.

Interpret as null (column-specific)

The above example showed how to interpret a symbol as null for all columns. This behavior can be set for a specific column by moving the interpretation to a single enhancement.

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ conversion:enhancement [ ov:csvCol 1; conversion:interpret [ conversion:symbol "-", "- "; conversion:interpretation conversion:null; ]; ]; ]; .

Other datasets that benefit from this enhancement includes Dataset 1491.

Cleaning up some values

A not-so elegant use of this enhancement is to tweak a handful of values into other values.

For example, NITRD added some footnotes when mentioning some agencies. We can tidy them up. This obviously only makes sense for a small number of values, and begs for a more generic value-tweaking mechanism (we just haven't seen enough need for it, yet).

  conversion:enhance [
     ov:csvCol         1;
     conversion:interpret [
        conversion:symbol "NIH 2";
        conversion:interpretation "NIH";
     ];
     conversion:interpret [
        conversion:symbol "DOE 2";
        conversion:interpretation "DOE";
     ];

Why does the output RDF have conversion:symbol and conversion:interpretation?

Although this output might look like a bug with conversion:symbol and conversion:interpretation, it is actually intended:

typed_agency:NIH 
    dcterms:identifier "NIH 2" ;
    a federal_research_and_development_budget_for_networking_and_information_technology_vocab:Agency ;
    conversion:symbol "NIH 2" ;
    conversion:interpretation "NIH" ;
  • Subsequent enhancement parameters can point to this dataset and get the symbol/interpretation pairings.
  • Subsequent enhancement parameters can also point to this dataset to get dcterms:identifiers during ObjectSameAsLinking.

$CSV2RDF4LOD_HOME/bin/util/symbol-interpretation.awk

Queries

(results):

PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
SELECT distinct *
WHERE {
  graph <http://purl.org/twc/vocab/conversion/ConversionProcess> {
    ?layer
       conversion:conversion_process [
          conversion:interpret [
            conversion:symbol         ?symbol;
            conversion:interpretation ?interp
          ];
       ]
    .
  }
}
Clone this wiki locally