-
Notifications
You must be signed in to change notification settings - Fork 36
Generating enhancement parameters
Enhancement parameters are automatically generated and placed in manual/
when the initial raw conversion is performed. As described in Directory Conventions, the purpose of the manual/
directory is to hold all files that involved a human touch. Although the enhancement parameters are automatically generated, someone needs tweak them by asserting more than just conversion:range todo:Literal
that is provided by default.
NOTE: These implementation details are not necessary to use csv2rdf4lod-automation to convert data; they are provided here for informational purposes only.
java edu.rpi.tw.data.csv.impl.CSVHeaders <file> [headerLineNumber]
Returns the values in the first row of a CSV file -- one per line. Other rows can be returned by indicating a row number.
bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv
Reference:
FY 1999 Supplement to the President's Budget
bash-3.2$
The headers are actually on the fourth row of this CSV:
bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv 4
Agency
High End Computing and Computation
Large Scale Networking
High Confidence Systems
Human Centered Systems
Education, Training, & Human Resources
TOTAL
The script $CSV2RDF4LOD_HOME/bin/util/header2params2.awk
can accept these headers and produce Turtle RDF file template for the enhancement parameters. $CSV2RDF4LOD_HOME/bin/util/header2params2.awk
takes a handful of parameters for the source_identifier, dataset_identifier, etc. -- see the script for details.
....
@prefix ov: <http://open.vocab.org/terms/> .
@prefix conversion: <http://purl.org/twc/vocab/conversion/> .
....
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "nitrd-gov";
conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
conversion:dataset_version "2011-Jan-27";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:subject_discriminator "fy99_supp_cic_r&d_budget_cross_cut";
conversion:enhance [
ov:csvRow 4;
a conversion:HeaderRow;
];
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "Agency";
conversion:label "Agency";
conversion:comment "";
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "";
conversion:label "";
conversion:comment "";
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 3;
ov:csvHeader "High End Computing and Computation";
conversion:label "High End Computing and Computation";
conversion:comment "";
conversion:range todo:Literal;
];
Reusing enhancement parameters for multiple versions or datasets