-
Notifications
You must be signed in to change notification settings - Fork 36
Generating enhancement parameters
See Conversion process phase: pull conversion trigger, which is where enhancement parameters are generated for you.
Enhancement parameters are automatically generated and placed in manual/
when the initial raw conversion is performed. As described in Directory Conventions, the purpose of the manual/
directory is to hold all files that involved a human's touch. Although the enhancement parameters are automatically generated, someone needs to tweak them by asserting more than just the default [ conversion:range todo:Literal; ]
.
./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh
writes automatic/*.raw.params.ttl
every time it performs the verbatim conversion.
./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh
writes manual/*.e1.params.ttl
if they are not there.
./convert-federal_research_and_development_budget_for_networking_and_information_technology.sh -e 2
writes manual/*.e2.params.ttl
if they are not there.
Same as N = 2 above, but with a different value.
When the default enhancement parameters are created, the environment variables CSV2RDF4LOD_CONVERT_MACHINE_URI
and CSV2RDF4LOD_CONVERT_PERSON_URI are used to capture information about the person responsible. This can then be used to acknowledge the person's effort and calculate the impact their data curation has on subsequent data products and demonstrations. The unix command whoami
is also used to describe the creatorship.
NOTE: These implementation details are not necessary to use csv2rdf4lod-automation to convert data; they are provided here for informational purposes only.
java edu.rpi.tw.data.csv.impl.CSVHeaders <file> [headerLineNumber]
Returns the values in the first row of a CSV file -- one per line. Other rows can be returned by indicating a row number.
bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv
Reference:
FY 1999 Supplement to the President's Budget
bash-3.2$
The headers are actually on the fourth row of this CSV:
bash-3.2$ java edu.rpi.tw.data.csv.impl.CSVHeaders manual/FY99_Supp_CIC_R\&D_Budget_Cross_Cut.csv 4
Agency
High End Computing and Computation
Large Scale Networking
High Confidence Systems
Human Centered Systems
Education, Training, & Human Resources
TOTAL
The script $CSV2RDF4LOD_HOME/bin/util/header2params2.awk
can accept these headers and produce Turtle RDF file template for the enhancement parameters. $CSV2RDF4LOD_HOME/bin/util/header2params2.awk
takes a handful of parameters for the source_identifier, dataset_identifier, etc. -- see the script for details.
....
@prefix ov: <http://open.vocab.org/terms/> .
@prefix conversion: <http://purl.org/twc/vocab/conversion/> .
....
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "nitrd-gov";
conversion:dataset_identifier "federal_research_and_development_budget_for_networking_and_information_technology";
conversion:dataset_version "2011-Jan-27";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:subject_discriminator "fy99_supp_cic_r&d_budget_cross_cut";
conversion:enhance [
ov:csvRow 4;
a conversion:HeaderRow;
];
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "Agency";
conversion:label "Agency";
conversion:comment "";
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "";
conversion:label "";
conversion:comment "";
conversion:range todo:Literal;
];
conversion:enhance [
ov:csvCol 3;
ov:csvHeader "High End Computing and Computation";
conversion:label "High End Computing and Computation";
conversion:comment "";
conversion:range todo:Literal;
];
- Conversion process phase: pull conversion trigger - when the parameters are created for you.
- Reusing enhancement parameters for multiple versions or datasets
- CSV2RDF4LOD_CONVERT_PERSON_URI - an environment variable used to augment the generated enhancement parameters with enough info to cite you.
- Characterizing table completeness