Skip to content

Generating enhancement parameters

timrdf edited this page Feb 6, 2011 · 41 revisions

Enhancement parameters are automatically generated and placed in manual/ when the initial raw conversion is performed. As described in Directory Conventions, the purpose of the manual/ directory is to hold all files that involved a human's touch. Although the enhancement parameters are automatically generated, someone needs to tweak them by asserting more than just the default [ conversion:range todo:Literal; ].

Authorship of enhancement parameters

When the default enhancement parameters are created, the [environment variables](Controlling automation using CSV2RDF4LOD_ environment variables) CSV2RDF4LOD_CONVERT_MACHINE_URI and CSV2RDF4LOD_CONVERT_PERSON_URI are used to capture information about the person responsible. This can then be used to acknowledge the person's effort and calculate the impact their data curation has on subsequent data products and demonstrations. The unix command whoami is also used to describe the creatorship.

<http://tw.rpi.edu/instances/TimLebo> foaf:holdsAccount <http://tw.rpi.edu/web/inside/machine/lebot_macbook#lebot> .
<http://tw.rpi.edu/web/inside/machine/lebot_macbook#lebot>
   a foaf:OnlineAccount;
   foaf:accountName "lebot";
   sioc:account_of <http://tw.rpi.edu/instances/TimLebo>;
.

:dataset a void:Dataset;
   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "nitrd-gov";
   conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
   conversion:version_identifier "2011-Jan-27";
   conversion:conversion_process [
      a conversion:EnhancementConversionProcess;
      conversion:enhancement_identifier "1";
      conversion:subject_discriminator  "fy99_supp_cic_r&d_budget_cross_cut";

      dcterms:creator <http://tw.rpi.edu/web/inside/machine/lebot_macbook#lebot>;

Enhancement 1 parameters

./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh

Enhancement 2 parameters

./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh -e 2

Implementation

NOTE: These implementation details are not necessary to use csv2rdf4lod-automation to convert data; they are provided here for informational purposes only.

java edu.rpi.tw.data.csv.impl.CSVHeaders <file> [headerLineNumber]

Returns the values in the first row of a CSV file -- one per line. Other rows can be returned by indicating a row number.

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv
Reference:
FY 1999 Supplement to the President's Budget






bash-3.2$ 

The headers are actually on the fourth row of this CSV:

bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv 4
Agency

High End Computing and Computation
Large Scale Networking

High Confidence Systems
Human Centered Systems
Education, Training, & Human Resources
TOTAL

The script $CSV2RDF4LOD_HOME/bin/util/header2params2.awk can accept these headers and produce Turtle RDF file template for the enhancement parameters. $CSV2RDF4LOD_HOME/bin/util/header2params2.awk takes a handful of parameters for the source_identifier, dataset_identifier, etc. -- see the script for details.

....
@prefix ov:         <http://open.vocab.org/terms/> .
@prefix conversion: <http://purl.org/twc/vocab/conversion/> .
....

:dataset a void:Dataset;
   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "nitrd-gov";
   conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
   conversion:dataset_version    "2011-Jan-27";
   conversion:conversion_process [
      a conversion:RawConversionProcess;
      conversion:enhancement_identifier "1";
      conversion:subject_discriminator  "fy99_supp_cic_r&d_budget_cross_cut";
      conversion:enhance [      
         ov:csvRow 4;
         a conversion:HeaderRow;
      ];                        
      conversion:enhance [
         ov:csvCol         1;
         ov:csvHeader     "Agency";
         conversion:label "Agency";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         2;
         ov:csvHeader     "";
         conversion:label "";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];
      conversion:enhance [
         ov:csvCol         3;
         ov:csvHeader     "High End Computing and Computation";
         conversion:label "High End Computing and Computation";
         conversion:comment "";
         conversion:range  todo:Literal;
      ];

What's next?

Reusing enhancement parameters for multiple versions or datasets

Clone this wiki locally