Skip to content

Script: justify.sh

timrdf edited this page Mar 2, 2011 · 9 revisions

Example

http://logd.tw.rpi.edu/source/data-gov/dataset/1008 provides a zip file with a pdf and csv in it.

Let's say we only want to work with the WATER subset of the entire CSV, which is 25 of 46 data entries:

bash-3.2$ wc -l source/STATE_SINGLE_PW.CSV 
      46 source/STATE_SINGLE_PW.CSV
bash-3.2$ cat source/STATE_SINGLE_PW.CSV | grep "WATER" | wc -l
      25

We make a new file in manual/ because it is a modified version of the original (from-government) files in source/:

bash-3.2$ cat source/STATE_SINGLE_PW.CSV | grep "WATER" > manual/STATE_SINGLE_PW.CSV

We want to associate this new file to where it came from:

bash-3.2$ justify.sh source/STATE_SINGLE_PW.CSV manual/STATE_SINGLE_PW.CSV 
usage: justify.sh /path/to/source/a.xls /path/to/destination/a.xls.csv <engine-name>
   engine-name: (URI-friendly) e.g.:
      xls2csv,   tab2comma,     redelimit,            file_rename,   escaping_commas_redelimit
      duplicate, google_refine, serialization_change, parse_field,   tabulating_fixed_width
      html_tidy, pretty_print,  xsl_html_scrape,      manual_csvify, uncompress
      select_subset, etc.

justify.sh just provided some suggestions for methods that could be applied from one file to the next. We'll choose select_subset by adding it to the end of the command and rerunning it:

bash-3.2$ justify.sh source/STATE_SINGLE_PW.CSV manual/STATE_SINGLE_PW.CSV select_subset

---------------------------------- justify ---------------------------------------
source/STATE_SINGLE_PW.CSV (a conv:Select_subset_Engine applying conv:select_subset_Method) -> manual/STATE_SINGLE_PW.CSV
manual/STATE_SINGLE_PW.CSV came from source/STATE_SINGLE_PW.CSV
source/STATE_SINGLE_PW.CSV -> manual/STATE_SINGLE_PW.CSV
--------------------------------------------------------------------------------

The provenance captured is stored in a file with the resulting file name plus .pml.ttl:

bash-3.2$ cat manual/STATE_SINGLE_PW.CSV.pml.ttl

@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix sioc:    <http://rdfs.org/sioc/ns#> .
@prefix pmlp:    <http://inference-web.org/2.0/pml-provenance.owl#> .
@prefix pmlj:    <http://inference-web.org/2.0/pml-justification.owl#> .
@prefix conv:    <http://purl.org/twc/vocab/conversion/> .

<STATE_SINGLE_PW.CSV>
   a pmlp:Information;
   pmlp:hasModificationDateTime "2011-03-02T13:31:53-05:00"^^xsd:dateTime;
.
<STATE_SINGLE_PW.CSV>
   a pmlp:Information;
   nfo:hasHash <md5_db0a34538e1441633ab05bd962af6d4c_time_1299090728>;
.
<md5_db0a34538e1441633ab05bd962af6d4c_time_1299090728>
   a nfo:FileHash; 
   dcterms:date "2011-03-02T13:32:08-05:00"^^xsd:dateTime;
   nfo:hashAlgorithm "md5";
   nfo:hashValue "db0a34538e1441633ab05bd962af6d4c";
.

<../source/STATE_SINGLE_PW.CSV>
   a pmlp:Information;
   pmlp:hasModificationDateTime "2011-03-02T13:31:51-05:00"^^xsd:dateTime;
.
<../source/STATE_SINGLE_PW.CSV>
   a pmlp:Information;
   nfo:hasHash <md5_2afc25f886dbe56fdd15007d47f0c4c5_time_1299090728>;
.
<md5_2afc25f886dbe56fdd15007d47f0c4c5_time_1299090728>
   a nfo:FileHash; 
   dcterms:date "2011-03-02T13:32:08-05:00"^^xsd:dateTime;
   nfo:hashAlgorithm "md5";
   nfo:hashValue "2afc25f886dbe56fdd15007d47f0c4c5";
.

<nodeSet_6111cfa9-0179-42c5-a03b-fa7be3fc92cc>
   a pmlj:NodeSet;
   pmlj:hasConclusion <STATE_SINGLE_PW.CSV>;
   pmlj:isConsequentOf [
      a pmlj:InferenceStep;
      pmlj:hasIndex 0;
      pmlj:hasAntecedentList ( <nodeSet_6111cfa9-0179-42c5-a03b-fa7be3fc92cc_antecedent> <nodeSet_6111cfa9-0179-42c5-a03b-fa7be3fc92cc_user> );
      pmlj:hasInferenceEngine <select_subset_6111cfa9-0179-42c5-a03b-fa7be3fc92cc>;
      pmlj:hasInferenceRule   conv:select_subset_Method;
   ];
.

<nodeSet_6111cfa9-0179-42c5-a03b-fa7be3fc92cc_antecedent>
   a pmlj:NodeSet;
   pmlj:hasConclusion <source/STATE_SINGLE_PW.CSV>;
.

<nodeSet_6111cfa9-0179-42c5-a03b-fa7be3fc92cc_user>
   a pmlj:NodeSet;
   pmlp:hasConclusion <user_6111cfa9-0179-42c5-a03b-fa7be3fc92cc>;
.

<user_6111cfa9-0179-42c5-a03b-fa7be3fc92cc>
   foaf:accountName "lebot";
.

<select_subset_6111cfa9-0179-42c5-a03b-fa7be3fc92cc>
   a pmlp:InferenceEngine, conv:Select_subset_Engine;
   dcterms:identifier "select_subset_6111cfa9-0179-42c5-a03b-fa7be3fc92cc";
.

conv:Select_subset_Engine rdfs:subClassOf pmlp:InferenceEngine .
Clone this wiki locally