-
Notifications
You must be signed in to change notification settings - Fork 6
Queries Advanced Topics
- SubClasses and SubProperties
- Unions
- Construct Queries
- Delete Queries
- Optimizations Internal
- Circular Graphs
When a class is added to the query canvas, any queries generated will search for that class or any of its sub-classes.
Likewise, when a property is used (an edge Object Property or a Data Property), any queries generated will search for that property or any sub-properties. For a sub-property to be considered, it must also have a Domain of a super-class* or sub-class* of the subject class.
To query specific sets of sub-properties or sub-classes, use the union query.
SPARQLgraph can be used to generate union queries. In general, a union query is an "OR" where expressions share a set of sparqlId's. Creation of union queries in SPARQLgraph is based on the following:
- create branch points for the union in a property or class, these will be marked with a colored Union symbol U
- subgraphs beyond the branch point will be shown in the matching color
- inside unions, sparqlId restrictions are loosened so that items from each branch can refer to the same return value
- unions may be nested
The creation of union queries will be demonstrated with a brief tutorial using an ontology with a simple battery containing multiple colored cells. Consider these five batteries, each with up to four cells:
description | batt_ID | cell1_date | cell1_ID | cell1_color | cell2_ID | cell2_color | cell3_id | cell3_color | cell4_id | cell4_color |
---|---|---|---|---|---|---|---|---|---|---|
normal battery | battAA | 2017-03-23T10:23:00 | A | red | B | blue | C | white | D | white |
normal battery | battAB | 2017-03-23T10:24:00 | E | red | F | blue | G | white | H | white |
no date | battAC | I | red | J | blue | K | white | L | white | |
no colors on cells | battX | 2017-03-23T10:26:00 | M | N | O | P | ||||
two cells | battY | 2017-03-23T10:27:00 | Q | blue | R | blue |
Consider the following query:
Find all cells with color "red" OR with no color at all.
Return the cellId, along with the battery id and name.
Such a query looks like this in SPARQLgraph:
and is built with the following steps:
- create a Battery, and set ?id and ?name to be returned
- add a Cell, and set ?cellId to be returned
- add a color, and constrain it to "red". Using the "suggest values" button is helpful here. Also, remember to uncheck the "return" box so the color is not returned.
- create a union by selecting the "cell" arc and choosing "new union" off the "opt/minus/union" menu. At this point your subgraph will be rendered with a unique color, and the arc will be marked with a U
- add another Cell to the Battery.
- add to the union by selecting the new "cell" arc and choosing the "cell" union off the "opt/minus/union" menu. Now that this subgraph is added to the union, go back to the new Cell and return ?cellId making sure to use the same "cellId" sparqlId as the Cell in step 2.
- add a color to the new cell, and select the new color arc and choose "minus" off the "opt/minus/union" menu.
You now have a union with two subgraphs. The top subgraph matches all cells with color red. The bottom subgraph matches all cells with no color. The "?cellId" sparqlId is shared between the branches. To make it easy to inspect results, order by "cellId".
The following SPARQL is generated:
prefix ...
select distinct ?id ?name ?cellId
FROM <http://your/graph>
where {
?Battery a ?Battery_type .
?Battery_type rdfs:subClassOf* batterydemo:Battery.
?Battery batterydemo:id ?id .
?Battery batterydemo:name ?name .
{
?Battery batterydemo:cell ?Cell_1 .
BIND(?Cell_1 as ?Cell) .
?Cell_1 batterydemo:cellId ?cellId1 .
BIND(?cellId1 as ?cellId) .
?Cell_1 batterydemo:color ?Color_1 .
FILTER ( ?Color_1 IN (<http://kdl.ge.com/batterydemo#red> ) ) .
}
UNION
{
?Battery batterydemo:cell ?Cell .
?Cell batterydemo:cellId ?cellId .
minus {
?Cell batterydemo:color ?Color .
}
}
}
ORDER BY ?cellId
Note that under the hood, each item in the graph has a unique identifier. BIND statements are used to match ?cellId between the two subgraphs in the UNION.
This query returns all the red, and colorless cells:
id | name | cellId |
---|---|---|
battAA | normal battery | A |
battAB | normal battery | E |
battAC | no date | I |
battX | no colors on cells | M |
battX | no colors on cells | N |
battX | no colors on cells | O |
battX | no colors on cells | P |
For the sake of illustration, consider this query:
Find all cells with the letter 'y' in the id or in the name.
Return the cells' ids and names.
Such a query looks like this in SPARQLgraph:
and is built with the following steps:
- Add the Battery node to the nodegroup
- Select id:
- choose 'new union' from the menu
- apply the filter
FILTER regex(?id, "[Yy]")
- Select name:
- choose 'id' union from the menu
- apply the filter
FILTER regex(?name, "[Yy]")
You now have a union query that will return names and ids of all batteries that have the letter 'y' in the name or id.
The query will look like this:
prefix ...
select distinct ?id ?name
FROM <http://your/graph>
where {
?Battery a ?Battery_type .
?Battery_type rdfs:subClassOf* batterydemo:Battery.
{
?Battery batterydemo:id ?id .
FILTER regex(?id, "[Yy]") .
}
UNION
{
?Battery batterydemo:name ?name .
FILTER regex(?name, "[Yy]") .
}
}
and, given the data shown in the table above, will return the results:
id | name |
---|---|
normal battery | |
battY |
Now consider this query:
Find the id that belongs to any battery OR any blue cell
This query is the union of two disconnected subgraphs. It will look like this:
and is built with the following steps:
- Add the Battery node to the nodegroup
- return the ?id
- open the class URI and select "new union", and de-selecting "return"
- drag a Cell node, such that it is disconnected
- return the cellId as ?id
- open the class URI and select the "?Battery" union, and de-select "return"
- Add a Color to the Cell, and constrain it to "blue", de-selecting "return"
This results in a query that is the union of the two subgraphs, each of which returns something for ?id.
The query looks like this:
prefix ...
select distinct ?id
FROM <http://your/graph>
where {
{
?Cell a batterydemo:Cell .
?Cell batterydemo:cellId ?id_0 .
BIND(?id_0 as ?id) .
?Cell batterydemo:color ?Color .
FILTER ( ?Color IN (<http://kdl.ge.com/batterydemo#blue> ) ) .
}
UNION
{
?Battery a ?Battery_type .
?Battery_type rdfs:subClassOf* batterydemo:Battery.
?Battery batterydemo:id ?id .
}
}
and it returns the id of every battery and every blue cell:
id |
---|
F |
B |
J |
R |
Q |
battAB |
battAA |
battX |
battAC |
battY |
Consider the query "Cat named fluffy OR Cat does not have a kitty".
It is may be tempting to create a Cat and do a UNION on FILTER (?name, "fluffy") and MINUS hasKitty. That is, a union of a data property and MINUS an object property.
SemTK would create SPARQL like this:
?Cat a namespace:Cat
{
?Cat namespace:name ?name.
FILTER regex (?name, "fluffy").
} UNION {
MINUS { ?Cat namespace:hasKitty ?Kitty }
}
And given the W3C recommenadation, since the MINUS clause has no left-hand side, it will always succeed. This query will return all cats.
Instead, build the Union on two separate subgraphs. Once both Cat nodes are added to the union, they can both be named ?Cat and their name can both be named ?name. The ?Cat which is a single node holds the ?name with the FILTER regex (?name, "fluffy").
This will generate SPARQL like this:
{
?Cat a AnimalSubProps:Cat .
?Cat AnimalSubProps:name ?name .
minus {
?Cat AnimalSubProps:hasKitties ?Kitty .
}
} UNION {
?Cat a AnimalSubProps:Cat .
?Cat_1 AnimalSubProps:name ?name .
FILTER regex(?name, "fluffy") .
}
And this will return all cats named "fluffy" plus all cats which do not have kitties.
CONSTRUCT queries return results in graph form instead of table, thus taking full advantage of the semantic web stack. This type of query is accessed by setting the query dropdown (highlighted below in yellow) to construct.
Rules for building CONSTRUCT queries:
- any node and edge shown on the canvas are constructed
- any data properties selected for return are constructed
- any constraints are applied in the query WHERE clause
Hovering the mouse over a node will show:
- the URI of any class node
- the type of any data
A download link "results.json," which will download a file in JSON-LD format.
Note that different triplestores have been observed to interpret the JSON-LD format differently:
- a link to another object may be of the form { "@id": "ID123" } or just the string "ID123"
- data properties may be typed { @value: "35", @type: integer } or may be strings "35"
- types and URIs may be prefixed in full "uri://my/prefix#uri123" or abbreviated based on query prefixes "prefix:uri123"
The SPARQLgraph interface attempts to resolve these differences and show a standard network format.
DELETE queries work differently from CONSTRUCT queries, in that
- any node and edge shown on the canvas are added to the WHERE clause
- any data properties selected for return are added to the WHERE clause
- items to be deleted must be explicitly specified
Data properties and object property edge dialogs have select for delete check boxes
Node dialogs (accessed by clicking on the class name) contain a menu with a choice of delete modes:
- NO_DELETE
- TYPE_INFO_ONLY - only delete type triples with this node's matching URIs as the subject
- FULL_DELETE - delete all triples with this node's matching URIs in the subject or object
- LIMITED_TO_MODEL - like FULL_DELETE, but limited to relationships specified in the model
- LIMITED_TO_NODEGROUP - like FULL_DELETE but limited only to relationships in the nodegroup
FULL_DELETE on nodes is by far the most common type of delete query
SemTK attempts to optimize queries based on performance testing of different triplestores.
This is used in ingestion URILookups, which can be several queries per row of ingestion data. Hence this can have a very large performance impact.
- FILTER IN is preferred for AWS Neptune
- other triples stores are more performant with VALUES clause
This is a very common query clause since a node in a nodegroup typically matches all subclasses.
- Blazegraph peforms best with rdfs:subclassOf*
- other triple stores are more performant with a list of classes in a VALUES clause
For compatibility purposes, virtuoso VALUES clauses will contain one typed and one untyped version of each value in the VALUES clause for string and numeric constants.
In some situations it is meaningful to create queries that have multiple connections to the same node. In instances where this would create circularity in the nodegroup, SPARQLgraph is not currently able to show this graphically. A work-around is available.
Consider the case data has been incorrectly ingested such that a dog's puppy and it's parent are the same. A query to find this bad data would logically seem to be two nodes where ?Dog_Parent hasPuppy ?Dog_Child and the ?Dog_Child hasPuppy ?Dog_Parent, forming a circle which will not execute properly.
Instead build a three-node query, and set ?Dog_Child equal to ?Dog_Parent behind the scenes. Starting with this nodegroup:
click on the ?Dog_Child to get a dialog, and set ?Dog_Child equal to ?Dog_Parent like this:
The resulting query will now find instances where a dog (incorrectly) has the same child and parent.