-
Notifications
You must be signed in to change notification settings - Fork 6
Installing
These are instructions to get the SemTK services and UI up and running. It presumes Linux, but can run on Windows as well. An easy way to do this in windows is by running the commands shown on this page inside a bash shell such as git bash, which is included in the windows git distribution.
Install a triple store such as Virtuoso or Fuseki.
Fuseki is the recommended triplestore for SemTk. The latest distribution is at https://jena.apache.org/download/index.cgi.
Startup instructions are at https://jena.apache.org/documentation/fuseki2/fuseki-quick-start.html
Create a dataset (e.g. named "SemTK") that persists across Fuseki restarts.
Virtuoso is available through OpenLink Software. Installation instructions are at http://virtuoso.openlinksw.com/howto/
Install a web server such as Apache Tomcat or Apache HTTP Server (httpd)
Create a directory (referred to below as WEBAPPS) within your web server for the SemTK web app.
- Example for Tomcat:
/no_backup/tomcat/apache-tomcat-8.0.18/webapps/semtk
- Example for httpd:
/var/www/html
Create a directory (referred to below as SEMTK) for SemTK.
If updating/replacing an existing SemTK installation, be sure to save the existing ENV_OVERRIDE file.
If you need to install GIT, this might work for you:
$ sudo yum install git
$ git config user.name “Your Name”
$ git config user.email “You@email.com”`
If you need to install Maven, this might work for you:
$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
$ sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
$ sudo yum install -y apache-maven
Clone and build SemTK:
$ cd SEMTK
$ git clone https://github.com/ge-semtk/semtk.git
$ mvn clean install -DskipTests
- Download the binary distribution file (e.g. semtk-opensource-*-dist.tar.gz) from GitHub Releases to the SEMTK directory
- Unzip/untar the binary distribution file, which will create SEMTK/semtk-opensource
A default configuration file (.env
) can be found in the top-level semtk-opensource directory. Typically, some of the settings in this file will need to be overridden for the local environment. This should be done by creating a file called ENV_OVERRIDE
(do not change the .env
file). Some common ENV_OVERRIDE entries are as follows:
To start only a subset of the SemTK services (this example represents the 10 core SemTK services):
export ENABLED_SERVICES="nodeGroupExecutionService nodeGroupService nodeGroupStoreService ontologyInfoService sparqlExtDispatchService sparqlGraphIngestionService sparqlGraphResultsService sparqlGraphStatusService sparqlQueryService utilityService"
To change the temporary results file directory:
export resultsFileLocation=/directory12345/semtk-results
If you are using Fuseki: your ENV_OVERRIDE should contain these settings:
export SERVICES_DATASET_SERVER_URL=http://localhost:3030/SemTK
export SERVICES_DATASET_ENDPOINT_TYPE=fuseki
Note: the ENV_OVERRIDE
file will not be changed if the SemTK code is updated from GIT (e.g. with a git pull
)
$ ./startServices.sh
Install the SemTK UI to your web server with the following command, where WEBAPPS is the web server directory described above:
$ ./updateWebapps.sh WEBAPPS
Test that the UI is working by hitting my.machine.com/sparqlGraph/index.html
Optionally try the "Hello World" demo.
If your web machine can only be reached on ports like 80, 8080, 443 then you’ll need to use a reverse proxy.
There are many ways to do this, but here are some example lines for a reverse proxy .conf file (e.g. /etc/httpd/conf.d/default-site.conf)
ProxyPass /sparqlquery http://127.0.0.1:12050/
ProxyPassReverse /sparqlquery http://127.0.0.1:12050/
ProxyPass /ingestion http://127.0.0.1:12091/
ProxyPassReverse /ingestion http://127.0.0.1:12091/
In this case, the services are running on the same machine as the web server. If they a running somewhere else, use that url or IP instead of 127.0.0.1. Your configuration file will already have a line for:
ProxyPass / http://127.0.0.1:8080/
(but it might not direct to port 8080). In any event, make sure the lines are inserted into the reverse proxy config file before this default line.
When using a reverse proxy, the urls in the “Configuration” step above would change to use the new urls instead of the port numbers:
url : "http://my.machine.ge.com/ingestion/ingestion/",
url : "http://my.machine.ge.com/sparqlquery/sparqlQueryService/",
Set up an Apache web server (httpd) aws docs
- start http: sudo service httpd start
- find root directory for server: grep DocumentRoot /etc/httpd/conf/httpd.conf. We'll refer to this directory (e.g. /var/www/html) as WEBAPPS
- Set up /etc/httpd/conf.d/default-site.conf as outlined above in the "proxy" section.
Download the binary distribution file, move it to the AWS EC2 instance, and unzip it (all as described above).
The ENV_OVERRIDE file should look something like this:
# I needed to copy this whole folder to the node
export storeTemplateLocation=/run/semtk/semtk-opensource/sparqlGraphLibrary/src/main/resources/nodegroups/store.json
# TODO: I needed to create this folder
export resultsFileLocation=/tmp/DISPATCH_RESULTS
# TODO: on this host I can't find a name that works
export HOST_IP=10.200.100.200
export WEB_INGESTION_HOST=${HOST_IP}
export WEB_SPARQL_QUERY_HOST=${HOST_IP}
export WEB_STATUS_HOST=${HOST_IP}
export WEB_RESULTS_HOST=${HOST_IP}
export WEB_DISPATCH_HOST=${HOST_IP}
export WEB_HIVE_HOST=${HOST_IP}
export WEB_NODEGROUPSTORE_HOST=${HOST_IP}
export WEB_ONTOLOGYINFO_HOST=${HOST_IP}
export WEB_NODEGROUPEXECUTION_HOST=${HOST_IP}
export WEB_NODEGROUP_HOST=${HOST_IP}
# set the ports to use a proxy
export WEB_NODEGROUP_PORT=80/nodegroup
export WEB_INGESTION_PORT=80/ingestion
export WEB_SPARQL_QUERY_PORT=80/sparqlquery
export WEB_STATUS_PORT=80/status
export WEB_RESULTS_PORT=80/results
export WEB_HIVE_PORT=80/hive
export WEB_DISPATCH_PORT=80/dispatch
export WEB_NODEGROUPSTORE_PORT=80/nodegroupstore
export WEB_ONTOLOGYINFO_PORT=80/ontologyinfo
export WEB_NODEGROUPEXECUTION_PORT=80/nodegroupexec
# this is the only way to load a GE-specific variable that is needed in semtk-oss
export DISPATCHER_CLASS_NAME=com.ge.research.semtk.sparqlX.dispatch.EdcDispatcher
# set this to FQDN in order to get maximum speed to DGX within GE network
export resultsBaseURL=http://10.200.100.200/${PORT_SPARQLGRAPH_RESULTS_SERVICE}
TODO: I needed to edit semtk-opensource .fun because the host function didn't work
function sethostname
{
export HOST_NAME=$(hostname)
}
Install the SemTK UI, as described above.
Start the SemTK Services, as described above.
Here are instructions to build and run a SemTK Docker image: https://github.com/ge-semtk/semtk/blob/master/deploy/README.md
To optionally attach Google Analytics to your SemTK UI (SPARQLgraph) installation:
- Create a Google Analytics account
- Get your Tracking ID from Google
- Download googleAnalyticsLogger.js
- In googleAnalyticsLogger.js, replace YOUR_GOOGLE_ANALYTICS_TRACKING_ID with your tracking id from Google
- copy the file to your webapps, overwriting sparqlForm/main-oss/KDLEasyLoggerConfig.js
- copy the file to your webapps, overwriting sparqlGraph/main-oss/KDLEasyLoggerConfigOss.js
Reload your web page and Google Analytics will begin flowing.