-
Notifications
You must be signed in to change notification settings - Fork 36
Publishing conversion results with a Virtuoso triplestore
This page describes how to configure and use csv2rdf4lod-automation to publish its RDF conversion results into a Virtuoso triple store.
We assume that the Virtuoso triple store is already installed. If not, see:
- http://tw.rpi.edu/web/inside/endpoints/installing-virtuoso by TWC
- Setting up a local DBpedia mirror with Virtuoso by Jörn
Note that our follow-on project Prizms handles Virtuoso installation and configuration automatically now.
(update: virtuoso-install-info.sh can help provide the locations of the essential virtuoso installation components described here.)
When someone says, "Hey! I just finished installing Virtuoso!", ask them the locations of the following:
-
-
/opt/virtuoso/
- the directory created when installing Virtuoso. It containsbin/ lib/ share/ var/
.
-
-
-
/opt/virtuoso/bin/isql-v
- the command line client to the Virtuoso database.
-
-
-
/opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini
- the configuration file for the OpenLink Virtuoso VDBMS Server. The two settings of interest are:
ServerPort = 1111
DirsAllowed =
-
-
-
http://sparql.tw.rpi.edu/virtuoso/sparql
- the web-accessible SPARQL endpoint for the Virtuoso triplestore. When you (or csv2rdf4lod-automation) load something with/opt/virtuoso/bin/isql-v
, the data is queryable from this URL.
-
Before your Virtuoso guru goes out for coffee, make sure you find out the following as well:
- The port - often
1111
(found in virtuoso.ini'sServerPort
variable) - The username - often
dba
- The password - often
dba
-
/opt/csv2rdf4lod-automation/
- the directory created when installing; get it from:echo $CSV2RDF4LOD_HOME
. We'll need to give Virtuoso access to/opt/csv2rdf4lod-automation/tmp/
so it can load files from there. -
/projects/twc-lobd/data/source/
- the [directory holding the retrieved and converted data](csv2rdf4lod automation data root). This can be anywhere (and there can be many!), but it must be namedsource/
and everything within it must follow the "source - dataset - version" directory naming conventions.
Run cr-vars.sh to see the initial values, so we know what we are working from.
CSV2RDF4LOD_CONVERT_DATA_ROOT /Users/you/projects/data/source
...
CSV2RDF4LOD_PUBLISH_VIRTUOSO false
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME (will default to: /opt/virtuoso)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH (will default to: /opt/virtuoso/bin/isql)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT (will default to: 1111)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SCRIPT_PATH (DEPRECATED. will default to: /opt/virtuoso/scripts/vload)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT (will fail to describe provenance in pvload.sh)
Set the following environment variables in csv2rdf4lod-automation's source-me.sh (created when installing and make sure root
sources this when starting a new shell):
export CSV2RDF4LOD_CONVERT_DATA_ROOT="/projects/twc-lobd/data/source"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO="true"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME="/opt/virtuoso"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH="/opt/virtuoso/bin/isql"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT="1111"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT="http://sparql.tw.rpi.edu/virtuoso/sparql"
-
CSV2RDF4LOD_CONVERT_DATA_ROOT
should point to the csv2rdf4lod automation data root (the directory that holds your data and conforms to the Directory Conventions). Setting this is not required but it is used to avoid an unnecessary file copy when loading data (since Virtuoso daemon is given permissions to this directory withDirsAllowed
in the.ini
). Seeing a message sayingCOPYING
from any of the publish scripts indicates thatCSV2RDF4LOD_CONVERT_DATA_ROOT
should be changed to avoid copying load the file from where it resides. (two steps: set the variable and add the value of the variable to the.ini
).
Give Virtuoso permission to read your converted data by editing /opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini
:
DirsAllowed = ... , /opt/csv2rdf4lod-automation/tmp, /projects/twc-lobd/data/source/
Next, fire up Virtuoso (as root
):
root# virtuoso start
Starting OpenLink Virtuoso Open-Source Edition.
Server successfully started.
root# virtuoso status
Virtuoso server is running.
virtuoso
is a shell script wrapper at $CSV2RDF4LOD_HOME/bin/util/virtuoso/virtuoso
tail -f /opt/virtuoso/var/lib/virtuoso/db/virtuoso.log
As mentioned in 1, 2, and 3, the vload, vdelete, and virtuoso scripts were adopted from Shangguan's creation and care. The older scripts are still hanging around at the following locations:
- http://data-gov-wiki.googlecode.com/svn/trunk/web/logd.tw.rpi.edu/2010/virtuoso/
- https://scm.escience.rpi.edu/svn/public/logd-virtuoso-scripts/
See our documentation for the latest vload.
version of server: virtuoso-t -?
15:09:49 OpenLink Virtuoso Universal Server
15:09:49 Version 06.01.3126-pthreads for Linux as of Feb 19 2010
15:09:49 uses parts of OpenSSL, PCRE, Html Tidy
15:09:49 Database version 3126
15:09:50 SQL Optimizer enabled (max 1000 layouts)
15:09:51 Compiler unit is timed at 0.000803 msec
15:09:52 Roll forward started
15:09:52 2 transactions, 87 bytes replayed (100 %)
15:09:52 Roll forward complete
15:09:52 Checkpoint started
15:09:52 Checkpoint finished, log reused
15:10:13 Can listen but can't connect to self
15:10:13 Server exiting
-
ServerPort to the VM's IP?
-
An artifact of using our VMs. What's needed is to set the localhost IP to be the external IP and not 127.0.0.1. That IP is shared with the host server, and so everything connects to that rather than the VM.
-
Changing /etc/hosts for "localhost"...
-
This stackoverflow points
ResultSetMaxRows
,MaxQueryCostEstimationTime
,MaxQueryExecutionTime
, andDefaultQuery
as useful parameters to set in the .ini.
(Jul 2013, Hugh Williams) You can post questions on the support forum or in the Virtuoso open source mailing list at http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMailingLists
(Dec 2013) mail archives: http://sourceforge.net/mailarchive/forum.php?forum_name=virtuoso-users
As described here, Virtuoso can employ RDFS reasoning during query execution. The SPARQL query defines a special value input:inference
in the query header to indicate which ontology should be applied during query execution. The steps to take:
- Load an ontology into a named graph (say,
http://example.com/2/owl
) the normal way (i.e., using vload or pvload.sh) - Log in with isql-v and call the
rdfs_rule_set ('sample', 'rule_graph');
function to associate the graph named (e.g.)http://example.com/2/owl
with a "rule set logical name", which can just be the same string (e.g.)http://example.com/2/owl
. - When a query wants to use reasoning, it specifies the rule set logical name in the query:
DEFINE input:inference <http://example.com/2/owl>
(or,define input:inference "sample"
)
For a real example (for ichoose, which uses trust_ontology.owl loaded into graph name http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest):
$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-"1111"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-"dba"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-"dba"}
Connected to OpenLink Virtuoso
Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> rdfs_rule_set ('http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest', 'http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest');
Done. -- 13 msec.
Load an example instance (vload ttl manual/one-instance.ttl http://ichoose.tw.rpi.edu/fake_instance -v
):
<http://ichoose.tw.rpi.edu/fake_instance>
a <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#TrustworthinessGradeC> .
Then, submit SPARQL query to get results that includes the extra type for their owl:Restriction.
DEFINE input:inference <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest>
DESCRIBE <http://ichoose.tw.rpi.edu/fake_instance>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ns1: <http://ichoose.tw.rpi.edu/> .
ns1:fake_instance rdf:type _:vb89901 .
@prefix ns2: <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#> .
ns1:fake_instance rdf:type ns2:TrustworthinessGradeC .
How to exploit RDF Schema and OWL Inference Rules with minimal effort?
This Aug 2013 thread indicates that property chains are not supported.
Virtuoso returns the RDF in graph named http://localhost:8890/sparql
when the endpoint URL (e.g. http://ieeevis.tw.rpi.edu/sparql) is dereferenced. So, update that named graph to change what RDF is returned when the endpoint URL is dereferenced. See this email.
Fixable using one of the provided Secondary Derivative Datasets: cr-sparql-sd.
http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDumpNQuad
http://answers.semanticweb.com/questions/11843/does-virtuoso-versions-60-support-sparql-11
- http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSIndex gives an overview of versions
- http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/UpgradingToVOS610
Moving from an aptitude-installed to installed-from-source:
sudo aptitude remove virtuoso-opensource
sudo apt-get remove virtuoso-opensource
sudo apt-get autoremove
http://www.linuxfromscratch.org/blfs/view/svn/server/virtuoso.html looks very nice.
(NOT successful)
First, shut down apache so that virtuoso doesn't get any requests:
sudo service apache2 stop
Next, make some backups following http://virtuoso.openlinksw.com:8889/doc/html/backup.html
(got Error 42001: [Virtuoso Driver][Virtuoso Server]SR185: Undefined procedure DB.DBA.online_backup.
)
isql-v 1111 dba PASSWORD
SQL> checkpoint_interval (-1);
SQL> checkpoint;
SQL> online_backup ('ieeevis-virtuoso-backup-6.1.3-2014-Jul-02_', 500);
Next, shut it down:
sudo /etc/init.d/virtuoso-opensource stop
sudo find / -name "*virtuoso*" -print | grep -v lebot | grep -v prizms | grep -v xgmatwc
- http://docs.openlinksw.com/virtuoso/newadminui.html
- http://symbolicdata.org/wiki/LocalSparqlEndpoint
/usr/local/bin/isql-v 1111 dba dba
set password dba SOMEOTHERPASSWORD;
exit;
Virtuoso S1T00 Error SR171: Transaction timed out
- http://boards.openlinksw.com/phpBB3/viewtopic.php?f=12&t=436
-
http://docs.openlinksw.com/virtuoso/anytimequeries.html
- "A request to SPARQL web service endpoint may specify the timeout as an additoional &timeout= parameter. The parameter value should be equal to or greater than 1000, otherwise it will be ignored. If SPARQL Virtuoso configuration file contains MaxQueryExecutionTime parameter and its value is greater than or equal to 1000 then the actual "anytime" timeout is the minimum of the requested value and the value in the configuration file."
- http://docs.openlinksw.com/virtuoso/dbadm.html#ini_SPARQL
virtuoso-install-info.sh --help
...
method: upstream-source
virtuoso_t: /usr/local/bin/virtuoso-t
isql: /usr/local/bin/isql-v
ini: /usr/local/var/lib/virtuoso/db/virtuoso.ini
log: /usr/local/var/lib/virtuoso/db/virtuoso.log
init_d: /etc/init.d/virtuoso-opensource
sudo vi /usr/local/var/lib/virtuoso/db/virtuoso.ini
MaxQueryExecutionTime = 400 ; in seconds
sudo /etc/init.d/virtuoso-opensource stop
sudo /etc/init.d/virtuoso-opensource start
cache-queries.sh could be modified to set &timeout=<milliseconds>
parameter
Setting up proxy mapping for virtuoso 7 appears to be a little
different than 6. Instead of setting the output filter to just proxy-html
, you have to set
it to INFLATE;DEFLATE;proxy-html
. Changed the HTMLURLMap as well. See http://docs.openlinksw.com/virtuoso/webserver.html (and our internal ticket and Hugh's logging suggestion).
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideSPARQLEndpoints
install.sh does not know to install the mac-specific version of virtuoso, so say NO to it's offer to install it for you. Then do the following.
(Aug 2014) Follow https://github.com/openlink/virtuoso-opensource/blob/develop/7/README
(Mar 2015) Two source code changes before compiling https://gist.github.com/timrdf/7737dd03833e9be2b372
(Oct 2015) Released the scripts at bin/util/virtuoso/mac; run install-virtuoso-on-mac-dependencies.sh first b/c install-virtuoso-on-mac.sh doesn't check and offer to do it for you.
If you're in a hurry, blast through these 16 GRANTS and move on with your day. Below, I try to tease out which GRANT does what with more detail than anybody (including myself) should care to explore.
Running all 16 GRANTS listed by Joao Rocha gets it to "work", but I kinda feel like we brought a 52 cal to an arm wrestling match...
GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO "SPARQL"
GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO SPARQL_UPDATE
GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO SPARQL_UPDATE
GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO SPARQL_UPDATE
GRANT execute ON DB.DBA.SPARUL_RUN TO "SPARQL"
GRANT execute ON DB.DBA.SPARUL_RUN TO SPARQL_UPDATE
GRANT execute ON SPARUL_CLEAR TO "SPARQL"
GRANT execute ON SPARUL_CLEAR TO SPARQL_UPDATE
GRANT DELETE ON RDF_QUAD TO "SPARQL"
GRANT DELETE ON RDF_QUAD TO SPARQL_UPDATE
GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO "SPARQL"
GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO SPARQL_UPDATE
GRANT execute ON DB.DBA.L_O_LOOK TO "SPARQL"
GRANT execute ON DDB.DBA.L_O_LOOK TO SPARQL_UPDATE
Starting with a fresh Virtuoso 7.2, running:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
INSERT DATA
{ GRAPH <http://example/bookStore> { <http://example/book1> ns:price 42 } }
Leads to:
Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARQL_INSERT_DICT_CONTENT with user ID 10x, group ID 10x
Running:
$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO "SPARQL";
Done. -- 1 msec.
The INSERT DATA query works; reporting:
Insert into <http://example/bookStore>, 1 (or less) triples -- done
Starting with a fresh Virtuoso 7.2 (+DB.DBA.SPARQL_INSERT_DICT_CONTENT
), running:
CLEAR GRAPH <http://example/bookStore>
Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARUL_CLEAR with user ID 10x, group ID 10x
$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON SPARUL_CLEAR TO "SPARQL";
Done. -- 1 msec.
SQL>
Leads to:
Virtuoso 42000 Error SQ110: Permission denied for delete from DB.DBA.RDF_QUAD (user ID = 10x)
Chasing after it with:
isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT DELETE ON RDF_QUAD TO "SPARQL";
Done. -- 0 msec.
SQL> exit;
gives success:
Clear graph <http://example/bookStore> -- done
The following INSERT query works with just the above three (execute ON SPARQL_INSERT_DICT_CONTENT
, execute ON SPARUL_CLEAR
, and DELETE ON RDF_QUAD
) enabled.
PREFIX ns: <http://example.org/ns#>
INSERT {
GRAPH <http://example/bookStore2> {
?book ns:price ?bank
}
}
WHERE {
GRAPH <http://example/bookStore> {
?book ns:price ?price
}
BIND ((2 * ?price) AS ?bank)
}
insert data {
graph <http://my.name/id/md5/6a1ad551f95c1b618f6283058fe8e52b> {
<http://my.name/source/some-com/dataset/some-database/version/YYYY-MMM-dd/42>
...
Gives error:
org.openrdf.query.UpdateExecutionException: Virtuoso 42000 Error SR186: No permission to execute dpipe DB.DBA.L_O_LOOK with user ID 10x, group ID 10x
Running:
$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON DB.DBA.L_O_LOOK TO "SPARQL";
Done. -- 1 msec.
SQL> exit;
Not bothering to enable it, but here's the error:
DELETE { GRAPH ?g { <http://my.name/id/md5/b76a28c5da3bbdb208f4251eefdd4a> ?p ?o } } WHERE { graph ?g {<http://my.name/id/md5/b76a28c5da3bbdb208f4251eefdd4a> ?p ?o}}
results in:
Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARQL_DELETE_QUAD_DICT_CONTENT with user ID 10x, group ID 10x
prefix prov: <http://www.w3.org/ns/prov#>
insert {
graph <http://...selected> {
<http://...selected>
prov:hadMember ?b_1
.
}
}
using <http://.../dataset/1>
where {
service <http://some.example:8890/sparql> {
values ?b_1 {
<http://...Requester>
}
{
?b_1
?p []
.
}
union
{
[]
?d ?b_1
.
}
}
}
results in:
Virtuoso 42000 Error SQ200: Must have select privileges on view DB.DBA.SPARQL_SINV_2
running:
bash-3.2$ isql-v 1111 dba dba
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
Done. -- 4 msec.
SQL> exit;
Submitting SPARQL queries with SERVICE keywords requires additional permissions to actually execute.
As indicated at http://boards.openlinksw.com/phpBB3/viewtopic.php?f=14&t=5735
bash$ isql-v 1111 dba password
Connected to OpenLink Virtuoso
Driver: 07.10.3207 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
Done. -- 0 msec.
SQL> grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
Done. -- 1 msec.
SQL> exit;
http://sourceforge.net/p/virtuoso/mailman/message/34111851/
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtDeployingLinkedDataGuide
Very long "roll forward" when starting virtuoso-t
"If you create a new user with the "Primary Role" assignment of "dba" from the drop down listbox available in the Conductor then they will be able to perform operations under the "Linked Data" and other tabs." -Hugh