Skip to content

CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY

timrdf edited this page Apr 24, 2011 · 38 revisions

Background

Sample subsets are created every time a conversion is performed. These samples are helpful when exploring or prototyping large datasets.

The conversion output files named with .sample contain a subset of those without:

  • automatic/menu.csv.raw.sample.ttl
  • automatic/menu.csv.raw.ttl
  • automatic/menu.csv.e1.sample.ttl
  • automatic/menu.csv.e1.ttl

The files above get aggregated into files appropriate for publishing:

  • publish/dpdoughtroy-com-menu-2011-Apr-22.raw.sample.ttl
  • publish/dpdoughtroy-com-menu-2011-Apr-22.raw.ttl
  • publish/dpdoughtroy-com-menu-2011-Apr-22.e1.sample.ttl
  • publish/dpdoughtroy-com-menu-2011-Apr-22.e1.ttl

The size of the sample is controlled by specifying the number of data rows to process with the CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS shell environment variable, whose value can be seen with cr-vars.sh:

bash-3.2$ cr-vars.sh 
--
CSV2RDF4LOD_HOME                                         /Users/timrdf/csv2rdf4lod
...
...
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  2

Converting only the sample subset (i.e., Prevent conversion of the full dataset)

When developing enhancement parameters for a large dataset, it is helpful to avoid converting the full dataset because only a portion will be inspected before updating the parameters are rerunning the conversion. In this situation, since the sample subset is already performed, we can simply specify NOT to convert the full dataset using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY shell environment variable.

bash-3.2$ export CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY=true
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false

See also

Clone this wiki locally