-
Notifications
You must be signed in to change notification settings - Fork 36
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY
Sample subsets are created every time a conversion is performed. These samples are helpful when exploring or prototyping large datasets.
The conversion output files named with .sample
contain a subset of their larger counterparts:
automatic/menu.csv.raw.sample.ttl
automatic/menu.csv.raw.ttl
automatic/menu.csv.e1.sample.ttl
automatic/menu.csv.e1.ttl
The files above get aggregated into files appropriate for publishing:
publish/dpdoughtroy-com-menu-2011-Apr-22.raw.sample.ttl
publish/dpdoughtroy-com-menu-2011-Apr-22.raw.ttl
publish/dpdoughtroy-com-menu-2011-Apr-22.e1.sample.ttl
publish/dpdoughtroy-com-menu-2011-Apr-22.e1.ttl
The size of the sample is controlled by specifying the number of data rows to process with the CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS
shell environment variable, whose value can be seen with cr-vars.sh:
bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME /Users/timrdf/csv2rdf4lod
...
...
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS 2
When developing enhancement parameters for a large dataset, it is helpful to avoid converting the full dataset because only a portion will be inspected before updating the parameters are rerunning the conversion. In this situation, since the sample subset is already performed, we can simply specify NOT to convert the full dataset using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY
shell environment variable.
bash-3.2$ export CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY=true
The effect can be seen by cr-vars.sh:
bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME /Users/timrdf/csv2rdf4lod
...
...
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY true
and in automatic/
when running the conversions (the full conversion output file automatic/menu.csv.e1.ttl
is not created):
automatic/menu.csv.raw.params.ttl
automatic/menu.csv.raw.sample.ttl
automatic/menu.csv.raw.void.ttl
automatic/menu.csv.raw.ttl
automatic/menu.csv.e1.sample.ttl
If the time and care has been spent to create a useful enhancement parameters, it is likely that the raw layer will be relatively useless. If this is the case, then it can be omitted using the CSV2RDF4LOD_CONVERT_OMIT_RAW_LAYER
shell environment variable:
bash-3.2$ CSV2RDF4LOD_CONVERT_OMIT_RAW_LAYER="true"
The effect can be seen by cr-vars.sh:
bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME /Users/timrdf/csv2rdf4lod
...
...
--
CSV2RDF4LOD_CONVERT_OMIT_RAW_LAYER true
and in automatic/
when running the conversions (the raw conversion output file automatic/menu.csv.raw.ttl
is not created):
menu.csv.raw.params.ttl
menu.csv.e1.sample.ttl
menu.csv.e1.void.ttl
menu.csv.e1.ttl
-
Conversion process phase: retrieve for examples and discussion of the kinds of files that are created in
automatic/
. - Examples versus Samples for the difference that emerged between these two terms; the former from VoID and the latter discussed on this page.
- Generating a sample conversion using only a subset of data