-
Notifications
You must be signed in to change notification settings - Fork 36
Script: pvload.sh
- vload - a "provenance-free" shell script wrapper to Virtuoso's isql-vt.
- Naming sparql service description's sd:NamedGraph, so we can name a SPARQL endpoints' named graph.
- Named graphs that know where they came from, talks about provenance modeling of named graphs.
- The shell environment variable CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT_SEPARATE_NG_PROVENANCE
This page describes how to use pvload.sh to capture provenance of loading SPARQL triple store named graphs.
$ pvload.sh --help
usage: pvload.sh [--help] [-n] <url> [-ng <graph-name>] [--separate-provenance [--into (<prov-graph> | 'one')]]
-n : dry run - do not download or load into named graph.
<url> : the URL to retrieve and load into a named graph.
-ng <graph-name> : the name of the graph to place <url>. (if not provided, -ng == <url>).
--separate-provenance [ --into <prov-graph> ] :
store the provenance of loading <url> in a separate named graph, not in '-ng'.
if <prov-graph> is the value 'one', choose a global graph name.
(Setting envvar CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finest will leave temporary files after invocation.)
(See https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-pvload.sh)
-
CSV2RDF4LOD_BASE_URI is used to create URIs for instances of provenance.
- In the examples below, this is set to
http://provenanceweb.org
.
- In the examples below, this is set to
- CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT is the forward-facing URL for the SPARQL endpoint,
When http://provenanceweb.org/source/same.ttl contains one triple,
$ pvload.sh http://provenanceweb.org/source/same.ttl
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://provenanceweb.org/source/same.ttl
--> (PROV Graph) http://provenanceweb.org/source/same.ttl
results in 130 triples from:
select distinct count(*)
where {
graph <http://provenanceweb.org/source/same.ttl> {?s ?p ?o}
}
When http://provenanceweb.org/source/same.ttl contains one triple, it can be loaded into a graph named http://example.org/pvload-test-2:
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-2
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-2
--> (PROV Graph) http://example.org/pvload-test-2
results in 130 triples from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-2> {?s ?p ?o}
}
When http://provenanceweb.org/source/same.ttl contains one triple,
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-3 --separate-provenance
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-3
--> (PROV Graph) http://provenanceweb.org/graph-prov/example.org/pvload-test-3
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-3> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://provenanceweb.org/graph-prov/example.org/pvload-test-3> {?s ?p ?o}
}
This --separate-provenance
option is used when CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT_SEPARATE_NG_PROVENANCE is true
. Otherwise, the provenance goes into the same named graph [and clutters the data].
Adding the --into <prov_graph>
argument lets you control which graph to put the provenance into.
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-4 --separate-provenance --into http://example.org/put-my-provenance-here
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-4
--> (PROV Graph) http://example.org/put-my-provenance-here
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-4> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://example.org/put-my-provenance-here> {?s ?p ?o}
}
If you don't want to specify the name of the separate provenance graph, use the keyword one
and the path /graph-prov
will be used.
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-5 --separate-provenance --into one
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-5
--> (PROV Graph) http://provenanceweb.org/graph-prov
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-5> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://provenanceweb.org/graph-prov> {?s ?p ?o}
}
- Script: cache-queries.sh can be used to capture the provenance of querying a SPARQL endpoint.