Script: pvload.sh

What is first

vload - a "provenance-free" shell script wrapper to Virtuoso's isql-vt.
Naming sparql service description's sd:NamedGraph, so we can name a SPARQL endpoints' named graph.
Named graphs that know where they came from, talks about provenance modeling of named graphs.
The shell environment variable CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT_SEPARATE_NG_PROVENANCE

What we will cover

This page describes how to use pvload.sh to capture provenance of loading SPARQL triple store named graphs.

Let's get to it!

Usage

$ pvload.sh --help
usage: pvload.sh [--help] [-n] <url> [-ng <graph-name>] [--separate-provenance [--into (<prov-graph> | 'one')]]
  -n                    : dry run - do not download or load into named graph.
  <url>                 : the URL to retrieve and load into a named graph.
  -ng <graph-name>      : the name of the graph to place <url>. (if not provided, -ng == <url>).
  --separate-provenance [ --into <prov-graph> ] :
                          store the provenance of loading <url> in a separate named graph, not in '-ng'.
                          if <prov-graph> is the value 'one', choose a global graph name.

  (Setting envvar CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finest will leave temporary files after invocation.)
  (See https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-pvload.sh)

Environment variables that matter

CSV2RDF4LOD_BASE_URI is used to create URIs for instances of provenance.
- In the examples below, this is set to http://provenanceweb.org.
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT is the forward-facing URL for the SPARQL endpoint,
- e.g. http://provenanceweb.org/sparql.

Loading a URL into a graph with the same name

When http://provenanceweb.org/source/same.ttl contains one triple,

$ pvload.sh http://provenanceweb.org/source/same.ttl
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://provenanceweb.org/source/same.ttl
                   --> (PROV Graph)  http://provenanceweb.org/source/same.ttl

results in 130 triples from:

select distinct count(*)
where { 
  graph <http://provenanceweb.org/source/same.ttl> {?s ?p ?o}
}

Loading a URL into a graph with a different, custom name

When http://provenanceweb.org/source/same.ttl contains one triple, it can be loaded into a graph named http://example.org/pvload-test-2:

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-2
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-2
                   --> (PROV Graph)  http://example.org/pvload-test-2

results in 130 triples from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-2> {?s ?p ?o}
}

Loading the provenance of the load into a separate named graph, specific to the graph loaded

When http://provenanceweb.org/source/same.ttl contains one triple,

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-3 --separate-provenance
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-3
                   --> (PROV Graph)  http://provenanceweb.org/graph-prov/example.org/pvload-test-3

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-3> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://provenanceweb.org/graph-prov/example.org/pvload-test-3> {?s ?p ?o}
}

This --separate-provenance option is used when CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT_SEPARATE_NG_PROVENANCE is true. Otherwise, the provenance goes into the same named graph [and clutters the data].

Loading the provenance of the load into a separate named graph, with a different name

Adding the --into <prov_graph> argument lets you control which graph to put the provenance into.

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-4 --separate-provenance --into http://example.org/put-my-provenance-here
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-4
                   --> (PROV Graph)  http://example.org/put-my-provenance-here

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-4> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://example.org/put-my-provenance-here> {?s ?p ?o}
}

Loading the provenance of the load into a separate, shared, named graph

If you don't want to specify the name of the separate provenance graph, use the keyword one and the path /graph-prov will be used.

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-5 --separate-provenance --into one
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-5
                   --> (PROV Graph)  http://provenanceweb.org/graph-prov

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-5> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://provenanceweb.org/graph-prov> {?s ?p ?o}
}

What is next

Script: cache-queries.sh can be used to capture the provenance of querying a SPARQL endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script: pvload.sh

What is first

What we will cover

Let's get to it!

Usage

Environment variables that matter

Loading a URL into a graph with the same name

Loading a URL into a graph with a different, custom name

Loading the provenance of the load into a separate named graph, specific to the graph loaded

Loading the provenance of the load into a separate named graph, with a different name

Loading the provenance of the load into a separate, shared, named graph

What is next

Clone this wiki locally