Skip to content

Publishing conversion results with a Virtuoso triplestore

Tim L edited this page Dec 22, 2015 · 169 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

This page describes how to configure and use csv2rdf4lod-automation to publish its RDF conversion results into a Virtuoso triple store.

What is first: Virtuoso installed

We assume that the Virtuoso triple store is already installed. If not, see:

Note that our follow-on project Prizms handles Virtuoso installation and configuration automatically now.

Finding out what you need to know

Paths you need to know about an installed instance of Virtuoso

(update: virtuoso-install-info.sh can help provide the locations of the essential virtuoso installation components described here.)

When someone says, "Hey! I just finished installing Virtuoso!", ask them the locations of the following:

    1. /opt/virtuoso/ - the directory created when installing Virtuoso. It contains bin/ lib/ share/ var/.
    1. /opt/virtuoso/bin/isql-v - the command line client to the Virtuoso database.
    1. /opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini - the configuration file for the OpenLink Virtuoso VDBMS Server. The two settings of interest are:
    • ServerPort = 1111
    • DirsAllowed =
    1. http://sparql.tw.rpi.edu/virtuoso/sparql - the web-accessible SPARQL endpoint for the Virtuoso triplestore. When you (or csv2rdf4lod-automation) load something with /opt/virtuoso/bin/isql-v, the data is queryable from this URL.

Miscellaneous bits you need to know about a Virtuoso installation.

Before your Virtuoso guru goes out for coffee, make sure you find out the following as well:

  • The port - often 1111 (found in virtuoso.ini's ServerPort variable)
  • The username - often dba
  • The password - often dba

Paths you need to know about a csv2rdf4lod-automation installation.

  • /opt/csv2rdf4lod-automation/ - the directory created when installing; get it from: echo $CSV2RDF4LOD_HOME. We'll need to give Virtuoso access to /opt/csv2rdf4lod-automation/tmp/ so it can load files from there.
  • /projects/twc-lobd/data/source/ - the [directory holding the retrieved and converted data](csv2rdf4lod automation data root). This can be anywhere (and there can be many!), but it must be named source/ and everything within it must follow the "source - dataset - version" directory naming conventions.

Hook it all up.

Run cr-vars.sh to see the initial values, so we know what we are working from.

CSV2RDF4LOD_CONVERT_DATA_ROOT                      /Users/you/projects/data/source
...
CSV2RDF4LOD_PUBLISH_VIRTUOSO                       false
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME                  (will default to: /opt/virtuoso)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH             (will default to: /opt/virtuoso/bin/isql)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT                  (will default to: 1111)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME              (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD              (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SCRIPT_PATH           (DEPRECATED. will default to: /opt/virtuoso/scripts/vload)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT       (will fail to describe provenance in pvload.sh)

Set the following environment variables in csv2rdf4lod-automation's source-me.sh (created when installing and make sure root sources this when starting a new shell):

export CSV2RDF4LOD_CONVERT_DATA_ROOT="/projects/twc-lobd/data/source"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO="true"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME="/opt/virtuoso"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH="/opt/virtuoso/bin/isql"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT="1111"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD="dba"
export CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT="http://sparql.tw.rpi.edu/virtuoso/sparql"
  • CSV2RDF4LOD_CONVERT_DATA_ROOT should point to the csv2rdf4lod automation data root (the directory that holds your data and conforms to the Directory Conventions). Setting this is not required but it is used to avoid an unnecessary file copy when loading data (since Virtuoso daemon is given permissions to this directory with DirsAllowed in the .ini). Seeing a message saying COPYING from any of the publish scripts indicates that CSV2RDF4LOD_CONVERT_DATA_ROOT should be changed to avoid copying load the file from where it resides. (two steps: set the variable and add the value of the variable to the .ini).

Give Virtuoso permission to read your converted data by editing /opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini:

DirsAllowed = ... , /opt/csv2rdf4lod-automation/tmp, /projects/twc-lobd/data/source/

Next, fire up Virtuoso (as root):

root# virtuoso start
Starting OpenLink Virtuoso Open-Source Edition.
Server successfully started.

root# virtuoso status
Virtuoso server is running.

virtuoso is a shell script wrapper at $CSV2RDF4LOD_HOME/bin/util/virtuoso/virtuoso

tail -f /opt/virtuoso/var/lib/virtuoso/db/virtuoso.log

Special thanks to Shangguan

As mentioned in 1, 2, and 3, the vload, vdelete, and virtuoso scripts were adopted from Shangguan's creation and care. The older scripts are still hanging around at the following locations:

See our documentation for the latest vload.

Debugging notes

version of server: virtuoso-t -?

15:09:49 OpenLink Virtuoso Universal Server
15:09:49 Version 06.01.3126-pthreads for Linux as of Feb 19 2010
15:09:49 uses parts of OpenSSL, PCRE, Html Tidy
15:09:49 Database version 3126
15:09:50 SQL Optimizer enabled (max 1000 layouts)
15:09:51 Compiler unit is timed at 0.000803 msec
15:09:52 Roll forward started
15:09:52     2 transactions, 87 bytes replayed (100 %)
15:09:52 Roll forward complete
15:09:52 Checkpoint started
15:09:52 Checkpoint finished, log reused
15:10:13 Can listen but can't connect to self
15:10:13 Server exiting
  • ServerPort to the VM's IP?

  • An artifact of using our VMs. What's needed is to set the localhost IP to be the external IP and not 127.0.0.1. That IP is shared with the host server, and so everything connects to that rather than the VM.

  • Changing /etc/hosts for "localhost"...

  • This stackoverflow points ResultSetMaxRows, MaxQueryCostEstimationTime, MaxQueryExecutionTime, and DefaultQuery as useful parameters to set in the .ini.

Virtuoso support

(Jul 2013, Hugh Williams) You can post questions on the support forum or in the Virtuoso open source mailing list at http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMailingLists

(Dec 2013) mail archives: http://sourceforge.net/mailarchive/forum.php?forum_name=virtuoso-users

Enabling reasoning in Virtuoso

As described here, Virtuoso can employ RDFS reasoning during query execution. The SPARQL query defines a special value input:inference in the query header to indicate which ontology should be applied during query execution. The steps to take:

  • Load an ontology into a named graph (say, http://example.com/2/owl) the normal way (i.e., using vload or pvload.sh)
  • Log in with isql-v and call the rdfs_rule_set ('sample', 'rule_graph'); function to associate the graph named (e.g.) http://example.com/2/owl with a "rule set logical name", which can just be the same string (e.g.) http://example.com/2/owl.
  • When a query wants to use reasoning, it specifies the rule set logical name in the query: DEFINE input:inference <http://example.com/2/owl> (or, define input:inference "sample")

For a real example (for ichoose, which uses trust_ontology.owl loaded into graph name http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest):

$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-"1111"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-"dba"} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-"dba"}
Connected to OpenLink Virtuoso
Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> rdfs_rule_set ('http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest', 'http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest');

Done. -- 13 msec.

Load an example instance (vload ttl manual/one-instance.ttl http://ichoose.tw.rpi.edu/fake_instance -v):

<http://ichoose.tw.rpi.edu/fake_instance> 
 a <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#TrustworthinessGradeC> .

Then, submit SPARQL query to get results that includes the extra type for their owl:Restriction.

DEFINE input:inference <http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest>
DESCRIBE  <http://ichoose.tw.rpi.edu/fake_instance>
@prefix rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ns1:	<http://ichoose.tw.rpi.edu/> .
ns1:fake_instance	rdf:type	_:vb89901 .
@prefix ns2:	<http://ichoose.tw.rpi.edu/source/ctg-albany-edu/dataset/ontologies/version/latest/trust_ontology#> .
ns1:fake_instance	rdf:type	ns2:TrustworthinessGradeC .

How to exploit RDF Schema and OWL Inference Rules with minimal effort?

This Aug 2013 thread indicates that property chains are not supported.

Modifying the SPARQL Service Description

Virtuoso returns the RDF in graph named http://localhost:8890/sparql when the endpoint URL (e.g. http://ieeevis.tw.rpi.edu/sparql) is dereferenced. So, update that named graph to change what RDF is returned when the endpoint URL is dereferenced. See this email.

Fixable using one of the provided Secondary Derivative Datasets: cr-sparql-sd.

Dumping N-Quads

http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDumpNQuad

SPARQL 1.1 (and 1.1ish)

http://answers.semanticweb.com/questions/11843/does-virtuoso-versions-60-support-sparql-11

Installing Virtuoso

Moving from an aptitude-installed to installed-from-source:

  • sudo aptitude remove virtuoso-opensource
  • sudo apt-get remove virtuoso-opensource
  • sudo apt-get autoremove

http://www.linuxfromscratch.org/blfs/view/svn/server/virtuoso.html looks very nice.

Upgrading VOS 6.1.6 to 7.1

(NOT successful)

First, shut down apache so that virtuoso doesn't get any requests:

sudo service apache2 stop

Next, make some backups following http://virtuoso.openlinksw.com:8889/doc/html/backup.html (got Error 42001: [Virtuoso Driver][Virtuoso Server]SR185: Undefined procedure DB.DBA.online_backup.)

isql-v 1111 dba PASSWORD

SQL> checkpoint_interval (-1);
SQL> checkpoint;
SQL> online_backup ('ieeevis-virtuoso-backup-6.1.3-2014-Jul-02_', 500);

Next, shut it down:

sudo /etc/init.d/virtuoso-opensource stop
sudo find / -name "*virtuoso*" -print | grep -v lebot | grep -v prizms | grep -v xgmatwc

Changing the dba password through isql-v

/usr/local/bin/isql-v 1111 dba dba
set password dba SOMEOTHERPASSWORD;
exit;

Timeouts

Virtuoso S1T00 Error SR171: Transaction timed out

virtuoso-install-info.sh

virtuoso-install-info.sh --help
...
method: upstream-source
virtuoso_t: /usr/local/bin/virtuoso-t
isql:       /usr/local/bin/isql-v
ini:        /usr/local/var/lib/virtuoso/db/virtuoso.ini
log:        /usr/local/var/lib/virtuoso/db/virtuoso.log
init_d:     /etc/init.d/virtuoso-opensource

sudo vi /usr/local/var/lib/virtuoso/db/virtuoso.ini
     MaxQueryExecutionTime       = 400   ; in seconds

sudo /etc/init.d/virtuoso-opensource stop
sudo /etc/init.d/virtuoso-opensource start

cache-queries.sh could be modified to set &timeout=<milliseconds> parameter

Proxying to Virtuoso 7.1

Setting up proxy mapping for virtuoso 7 appears to be a little different than 6. Instead of setting the output filter to just proxy-html, you have to set it to INFLATE;DEFLATE;proxy-html. Changed the HTMLURLMap as well. See http://docs.openlinksw.com/virtuoso/webserver.html (and our internal ticket and Hugh's logging suggestion).

Security SPARQL endpoints

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideSPARQLEndpoints

Installing Virtuoso on Mac

install.sh does not know to install the mac-specific version of virtuoso, so say NO to it's offer to install it for you. Then do the following.

(Aug 2014) Follow https://github.com/openlink/virtuoso-opensource/blob/develop/7/README

(Mar 2015) Two source code changes before compiling https://gist.github.com/timrdf/7737dd03833e9be2b372

(Oct 2015) Released the scripts at bin/util/virtuoso/mac; run install-virtuoso-on-mac-dependencies.sh first b/c install-virtuoso-on-mac.sh doesn't check and offer to do it for you.

Enabling SPARQL Update

All 16 GRANTs

If you're in a hurry, blast through these 16 GRANTS and move on with your day. Below, I try to tease out which GRANT does what with more detail than anybody (including myself) should care to explore.

Running all 16 GRANTS listed by Joao Rocha gets it to "work", but I kinda feel like we brought a 52 cal to an arm wrestling match...

GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO "SPARQL"
GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_MODIFY_BY_DICT_CONTENTS TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO "SPARQL"
GRANT execute ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO SPARQL_UPDATE

GRANT execute ON DB.DBA.SPARUL_RUN TO "SPARQL"
GRANT execute ON DB.DBA.SPARUL_RUN TO SPARQL_UPDATE

GRANT execute ON SPARUL_CLEAR TO "SPARQL"
GRANT execute ON SPARUL_CLEAR TO SPARQL_UPDATE
GRANT DELETE ON RDF_QUAD TO "SPARQL"
GRANT DELETE ON RDF_QUAD TO SPARQL_UPDATE

GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO "SPARQL"
GRANT execute ON DB.DBA.RDF_OBJ_ADD_KEYWORD_FOR_GRAPH TO SPARQL_UPDATE
GRANT execute ON DB.DBA.L_O_LOOK TO "SPARQL"
GRANT execute ON DDB.DBA.L_O_LOOK TO SPARQL_UPDATE

execute ON SPARQL_INSERT_DICT_CONTENT enables INSERT DATA

Starting with a fresh Virtuoso 7.2, running:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
INSERT DATA
{ GRAPH <http://example/bookStore> { <http://example/book1>  ns:price  42 } }

Leads to:

Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARQL_INSERT_DICT_CONTENT with user ID 10x, group ID 10x

Running:

$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON SPARQL_INSERT_DICT_CONTENT TO "SPARQL";

Done. -- 1 msec.

The INSERT DATA query works; reporting:

Insert into <http://example/bookStore>, 1 (or less) triples -- done

execute ON SPARUL_CLEAR and DELETE ON RDF_QUAD enables CLEAR GRAPH

Starting with a fresh Virtuoso 7.2 (+DB.DBA.SPARQL_INSERT_DICT_CONTENT), running:

CLEAR GRAPH <http://example/bookStore>
Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARUL_CLEAR with user ID 10x, group ID 10x
$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON SPARUL_CLEAR TO "SPARQL";

Done. -- 1 msec.
SQL> 

Leads to:

Virtuoso 42000 Error SQ110: Permission denied for delete from DB.DBA.RDF_QUAD (user ID = 10x)

Chasing after it with:

isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT DELETE ON RDF_QUAD TO "SPARQL";

Done. -- 0 msec.
SQL> exit;

gives success:

Clear graph <http://example/bookStore> -- done

INSERT WHERE

The following INSERT query works with just the above three (execute ON SPARQL_INSERT_DICT_CONTENT, execute ON SPARUL_CLEAR, and DELETE ON RDF_QUAD) enabled.

PREFIX ns: <http://example.org/ns#>
INSERT {
  GRAPH <http://example/bookStore2> {
   ?book ns:price ?bank
  }
}
WHERE { 
  GRAPH <http://example/bookStore> { 
    ?book ns:price ?price 
  } 
  BIND ((2 * ?price) AS ?bank)
}

execute ON DB.DBA.L_O_LOOK enables INSERT DATA (from Sesame)

insert data {
    graph <http://my.name/id/md5/6a1ad551f95c1b618f6283058fe8e52b> {
      <http://my.name/source/some-com/dataset/some-database/version/YYYY-MMM-dd/42>
    ...

Gives error:

org.openrdf.query.UpdateExecutionException: Virtuoso 42000 Error SR186: No permission to execute dpipe DB.DBA.L_O_LOOK with user ID 10x, group ID 10x

Running:

$ isql-v ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT:-1111} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME:-dba} ${CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD:-dba}
Connected to OpenLink Virtuoso
Driver: 07.20.3212 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> GRANT execute ON DB.DBA.L_O_LOOK TO "SPARQL";

Done. -- 1 msec.
SQL> exit;

?? DB.DBA.SPARQL_DELETE_QUAD_DICT_CONTENT enables DELETE

Not bothering to enable it, but here's the error:

DELETE { GRAPH ?g { <http://my.name/id/md5/b76a28c5da3bbdb208f4251eefdd4a> ?p ?o } } WHERE { graph ?g {<http://my.name/id/md5/b76a28c5da3bbdb208f4251eefdd4a> ?p ?o}}

results in:

Virtuoso 42000 Error SR186: No permission to execute procedure DB.DBA.SPARQL_DELETE_QUAD_DICT_CONTENT with user ID 10x, group ID 10x

?? DB.DBA.??? enables ??

           prefix prov: <http://www.w3.org/ns/prov#>
       
           insert {
               graph <http://...selected> {
                 <http://...selected>
                   prov:hadMember ?b_1
                 .
               }
           }
           using <http://.../dataset/1>
           where {
             service <http://some.example:8890/sparql> {
             values ?b_1 {
               <http://...Requester>
             }
             {
               ?b_1
                 ?p []
               .
             }
             union
             {
               []
                 ?d ?b_1
               .
             }
             }
           }

results in:

Virtuoso 42000 Error SQ200: Must have select privileges on view DB.DBA.SPARQL_SINV_2

Enabling Federated Query

Submitting SPARQL queries with SERVICE keywords requires additional permissions to actually execute.

As indicated at http://boards.openlinksw.com/phpBB3/viewtopic.php?f=14&t=5735

bash$ isql-v 1111 dba linkeddatav
Connected to OpenLink Virtuoso
Driver: 07.10.3207 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";

Done. -- 0 msec.
SQL> grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";

Done. -- 1 msec.
SQL> exit;

Upgrading from v6 to v7

http://sourceforge.net/p/virtuoso/mailman/message/34111851/

Using Virtuoso to deploy Linked Data

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtDeployingLinkedDataGuide

Very long roll forwards on start up

Very long "roll forward" when starting virtuoso-t

Avoiding DBA

"If you create a new user with the "Primary Role" assignment of "dba" from the drop down listbox available in the Conductor then they will be able to perform operations under the "Linked Data" and other tabs." -Hugh

32 bit / 64 bit with DB.DBA.RDF_64BIT_UPGRADE

Dan Davis' question

Clone this wiki locally