Skip to content

Publishing conversion results with a Virtuoso triplestore

Tim L edited this page May 14, 2015 · 169 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

This page describes how to configure and use csv2rdf4lod-automation to publish its RDF conversion results into a Virtuoso triple store.

What is first: Virtuoso installed

We assume that the Virtuoso triple store is already installed. If not, see:

Note that our follow-on project Prizms handles Virtuoso installation and configuration automatically now.

Finding out what you need to know

Paths you need to know about an installed instance of Virtuoso

(update: virtuoso-install-info.sh can help provide the locations of the essential virtuoso installation components described here.)

When someone says, "Hey! I just finished installing Virtuoso!", ask them the locations of the following:

    1. /opt/virtuoso/ - the directory created when installing Virtuoso. It contains bin/ lib/ share/ var/.
    1. /opt/virtuoso/bin/isql-v - the command line client to the Virtuoso database.
    1. /opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini - the configuration file for the OpenLink Virtuoso VDBMS Server. The two settings of interest are:
    • ServerPort = 1111
    • DirsAllowed =
    1. http://sparql.tw.rpi.edu/virtuoso/sparql - the web-accessible SPARQL endpoint for the Virtuoso triplestore. When you (or csv2rdf4lod-automation) load something with /opt/virtuoso/bin/isql-v, the data is queryable from this URL.

Miscellaneous bits you need to know about a Virtuoso installation.

Before your Virtuoso guru goes out for coffee, make sure you find out the following as well:

  • The port - often 1111 (found in virtuoso.ini's ServerPort variable)
  • The username - often dba
  • The password - often dba

Paths you need to know about a csv2rdf4lod-automation installation.

  • /opt/csv2rdf4lod-automation/ - the directory created when installing; get it from: echo $CSV2RDF4LOD_HOME. We'll need to give Virtuoso access to /opt/csv2rdf4lod-automation/tmp/ so it can load files from there.
  • /projects/twc-lobd/data/source/ - the [directory holding the retrieved and converted data](csv2rdf4lod automation data root). This can be anywhere (and there can be many!), but it must be named source/ and everything within it must follow the "source - dataset - version" directory naming conventions.

Hook it all up.

Run cr-vars.sh to see the initial values, so we know what we are working from.

CSV2RDF4LOD_CONVERT_DATA_ROOT                      /Users/you/projects/data/source
...
CSV2RDF4LOD_PUBLISH_VIRTUOSO                       false
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME                  (will default to: /opt/virtuoso)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH             (will default to: /opt/virtuoso/bin/isql)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT                  (will default to: 1111)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME              (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD              (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SCRIPT_PATH           (DEPRECATED. will default to: /opt/virtuoso/scripts/vload)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT       (will fail to describe provenance in pvload.sh)

Set the following environment variables in csv2rdf4lod-automation's source-me.sh (created when installing and make sure root sources this when starting a new shell):

export CSV2RDF4LOD_CONVERT_DATA_ROOT="/projects/twc-lobd/data/source"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO="true"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME="/opt/virtuoso"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH="/opt/virtuoso/bin/isql"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT="1111"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT="http://sparql.tw.rpi.edu/virtuoso/sparql"
  • CSV2RDF4LOD_CONVERT_DATA_ROOT should point to the csv2rdf4lod automation data root (the directory that holds your data and conforms to the Directory Conventions). Setting this is not required but it is used to avoid an unnecessary file copy when loading data (since Virtuoso daemon is given permissions to this directory with DirsAllowed in the .ini). Seeing a message saying COPYING from any of the publish scripts indicates that CSV2RDF4LOD_CONVERT_DATA_ROOT should be changed to avoid copying load the file from where it resides. (two steps: set the variable and add the value of the variable to the .ini).

Give Virtuoso permission to read your converted data by editing /opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini:

DirsAllowed = ... , /opt/csv2rdf4lod-automation/tmp, /projects/twc-lobd/data/source/

Next, fire up Virtuoso (as root):

root# virtuoso start
Starting OpenLink Virtuoso Open-Source Edition.
Server successfully started.

root# virtuoso status
Virtuoso server is running.

virtuoso is a shell script wrapper at $CSV2RDF4LOD_HOME/bin/util/virtuoso/virtuoso

tail -f /opt/virtuoso/var/lib/virtuoso/db/virtuoso.log

Special thanks to Shangguan

As mentioned in 1, 2, and 3, the vload, vdelete, and virtuoso scripts were adopted from Shangguan's creation and care. The older scripts are still hanging around at the following locations:

See our documentation for the latest vload.

Debugging notes

version of server: virtuoso-t -?

15:09:49 OpenLink Virtuoso Universal Server
15:09:49 Version 06.01.3126-pthreads for Linux as of Feb 19 2010
15:09:49 uses parts of OpenSSL, PCRE, Html Tidy
15:09:49 Database version 3126
15:09:50 SQL Optimizer enabled (max 1000 layouts)
15:09:51 Compiler unit is timed at 0.000803 msec
15:09:52 Roll forward started
15:09:52     2 transactions, 87 bytes replayed (100 %)
15:09:52 Roll forward complete
15:09:52 Checkpoint started
15:09:52 Checkpoint finished, log reused
15:10:13 Can listen but can't connect to self
15:10:13 Server exiting
  • ServerPort to the VM's IP?

  • An artifact of using our VMs. What's needed is to set the localhost IP to be the external IP and not 127.0.0.1. That IP is shared with the host server, and so everything connects to that rather than the VM.

  • Changing /etc/hosts for "localhost"...

  • This stackoverflow points ResultSetMaxRows, MaxQueryCostEstimationTime, MaxQueryExecutionTime, and DefaultQuery as useful parameters to set in the .ini.

Virtuoso support

(Jul 2013, Hugh Williams) You can post questions on the support forum or in the Virtuoso open source mailing list at http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMailingLists

(Dec 2013) mail archives: http://sourceforge.net/mailarchive/forum.php?forum_name=virtuoso-users

Enabling reasoning in Virtuoso

As described here, Virtuoso can employ RDFS reasoning during query execution. The SPARQL query defines a special value input:inference in the query header to indicate which ontology should be applied during query execution. The steps to take:

  • Load an ontology into a named graph (say, http://example.com/2/owl) the normal way (i.e., using vload or pvload.sh)
  • Log in with isql-v and call the rdfs_rule_set ('sample', 'rule_graph'); function to associate the graph named (e.g.) http://example.com/2/owl with a "rule set logical name", which can just be the same string (e.g.) http://example.com/2/owl.
  • When a query wants to use reasoning, it specifies the rule set logical name in the query: DEFINE input:inference <http://example.com/2/owl> (or, define input:inference "sample")

For a real example (for ichoose, which uses trust_ontology.owl loaded into graph name http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest):

$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-"1111"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-"dba"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-"dba"}
Connected to OpenLink Virtuoso
Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> rdfs_rule_set ('http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest', 'http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest');

Done. -- 13 msec.

Load an example instance (vload ttl manual/one-instance.ttl http://ichoose.tw.rpi.edu/fake_instance -v):

<http://ichoose.tw.rpi.edu/fake_instance> 
 a <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#TrustworthinessGradeC> .

Then, submit SPARQL query to get results that includes the extra type for their owl:Restriction.

DEFINE input:inference <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest>
DESCRIBE  <http://ichoose.tw.rpi.edu/fake_instance>
@prefix rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ns1:	<http://ichoose.tw.rpi.edu/> .
ns1:fake_instance	rdf:type	_:vb89901 .
@prefix ns2:	<http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#> .
ns1:fake_instance	rdf:type	ns2:TrustworthinessGradeC .

How to exploit RDF Schema and OWL Inference Rules with minimal effort?

This Aug 2013 thread indicates that property chains are not supported.

Modifying the SPARQL Service Description

Virtuoso returns the RDF in graph named http://localhost:8890/sparql when the endpoint URL (e.g. http://ieeevis.tw.rpi.edu/sparql) is dereferenced. So, update that named graph to change what RDF is returned when the endpoint URL is dereferenced. See this email.

Fixable using one of the provided Secondary Derivative Datasets: cr-sparql-sd.

Dumping N-Quads

http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDumpNQuad

SPARQL 1.1 (and 1.1ish)

http://answers.semanticweb.com/questions/11843/does-virtuoso-versions-60-support-sparql-11

Installing Virtuoso

Moving from an aptitude-installed to installed-from-source:

  • sudo aptitude remove virtuoso-opensource
  • sudo apt-get remove virtuoso-opensource
  • sudo apt-get autoremove

http://www.linuxfromscratch.org/blfs/view/svn/server/virtuoso.html looks very nice.

Upgrading VOS 6.1.6 to 7.1

(NOT successful)

First, shut down apache so that virtuoso doesn't get any requests:

sudo service apache2 stop

Next, make some backups following http://virtuoso.openlinksw.com:8889/doc/html/backup.html (got Error 42001: [Virtuoso Driver][Virtuoso Server]SR185: Undefined procedure DB.DBA.online_backup.)

isql-v 1111 dba PASSWORD

SQL> checkpoint_interval (-1);
SQL> checkpoint;
SQL> online_backup ('ieeevis-virtuoso-backup-6.1.3-2014-Jul-02_', 500);

Next, shut it down:

sudo /etc/init.d/virtuoso-opensource stop
sudo find / -name "*virtuoso*" -print | grep -v lebot | grep -v prizms | grep -v xgmatwc

Changing the dba password through isql-v

/usr/local/bin/isql-v 1111 dba dba
set password dba SOMEOTHERPASSWORD;
exit;

Timeouts

Virtuoso S1T00 Error SR171: Transaction timed out

virtuoso-install-info.sh

virtuoso-install-info.sh --help
...
method: upstream-source
virtuoso_t: /usr/local/bin/virtuoso-t
isql:       /usr/local/bin/isql-v
ini:        /usr/local/var/lib/virtuoso/db/virtuoso.ini
log:        /usr/local/var/lib/virtuoso/db/virtuoso.log
init_d:     /etc/init.d/virtuoso-opensource

sudo vi /usr/local/var/lib/virtuoso/db/virtuoso.ini
     MaxQueryExecutionTime       = 400   ; in seconds

sudo /etc/init.d/virtuoso-opensource stop
sudo /etc/init.d/virtuoso-opensource start

cache-queries.sh could be modified to set &timeout=<milliseconds> parameter

Proxying to Virtuoso 7.1

Setting up proxy mapping for virtuoso 7 appears to be a little different than 6. Instead of setting the output filter to just proxy-html, you have to set it to INFLATE;DEFLATE;proxy-html. Changed the HTMLURLMap as well. See http://docs.openlinksw.com/virtuoso/webserver.html (and our internal ticket and Hugh's logging suggestion).

Security SPARQL endpoints

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideSPARQLEndpoints

Installing Virtuoso on Mac

(Aug 2014) Follow https://github.com/openlink/virtuoso-opensource/blob/develop/7/README

(Mar 2015) Two source code changes before compiling https://gist.github.com/timrdf/7737dd03833e9be2b372

Enabling SPARQL Update

Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARQL_INSERT_DICT_CONTENT

Running all 16 GRANTS listed by Joao Rocha gets it to "work", but I kinda feel like we brought a 52 cal to an arm wrestling match...

GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO "SPARQL"
GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARUL_RUN TO "SPARQL"
GRANT execute ON DB.DBA.SPARUL_RUN TO SPARQL_UPDATE

GRANT execute ON SPARUL_CLEAR TO "SPARQL"
GRANT execute ON SPARUL_CLEAR TO SPARQL_UPDATE
GRANT DELETE ON RDF_QUAD TO "SPARQL"
GRANT DELETE ON RDF_QUAD TO SPARQL_UPDATE

GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO "SPARQL"
GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO SPARQL_UPDATE
GRANT execute ON DB.DBA.L_O_LOOK TO "SPARQL"
GRANT execute ON DDB.DBA.L_O_LOOK TO SPARQL_UPDATE

Upgrading from v6 to v7

http://sourceforge.net/p/virtuoso/mailman/message/34111851/

Using Virtuoso to deploy Linked Data

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtDeployingLinkedDataGuide

Clone this wiki locally