Skip to content

Generating enhancement parameters

timrdf edited this page Jan 29, 2011 · 41 revisions

Enhancement parameters are automatically generated and placed in manual/ when the initial raw conversion is performed. As described in Directory Conventions, the purpose of the manual/ directory is to hold all files that involved a human touch. Although the enhancement parameters are automatically generated, someone needs tweak them by asserting more than just conversion:range todo:Literal that is provided by default.

Implementation

NOTE: These implementation details are not necessary to use csv2rdf4lod-automation to convert data; they are provided here for informational purposes only.

java edu.rpi.tw.data.csv.impl.CSVHeaders <file> [headerLineNumber]

Returns the values in the first row of a CSV file -- one per line. Other rows can be returned by indicating a row number.

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv
Reference:
FY 1999 Supplement to the President's Budget






bash-3.2$ 

The headers are actually on the fourth row of this CSV:

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv 4
Agency

High End Computing and Computation
Large Scale Networking

High Confidence Systems
Human Centered Systems
Education, Training, & Human Resources
TOTAL

The script $CSV2RDF4LOD_HOME/bin/util/header2params2.awk can accept these headers and produce Turtle RDF file template for the enhancement parameters. $CSV2RDF4LOD_HOME/bin/util/header2params2.awk takes a handful of parameters for the source_identifier, dataset_identifier, etc. -- see the script for details.

....
@prefix ov:         <http://open.vocab.org/terms/> .
@prefix conversion: <http://purl.org/twc/vocab/conversion/> .
....

:dataset a void:Dataset;
   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "nitrd-gov";
   conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
   conversion:dataset_version    "2011-Jan-27";
   conversion:conversion_process [
      a conversion:RawConversionProcess;
      conversion:enhancement_identifier "1";
      conversion:subject_discriminator  "fy99_supp_cic_r&d_budget_cross_cut";
      conversion:enhance [      
         ov:csvRow 4;
         a conversion:HeaderRow;
      ];                        
      conversion:enhance [
         ov:csvCol         1;
         ov:csvHeader     "Agency";
         conversion:label "Agency";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         2;
         ov:csvHeader     "";
         conversion:label "";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         3;
         ov:csvHeader     "High End Computing and Computation";
         conversion:label "High End Computing and Computation";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];

What's next?

Reusing enhancement parameters for multiple versions or datasets

Clone this wiki locally