Skip to content

Queries Advanced Topics

Paul Cuddihy GE Research edited this page Jul 7, 2022 · 25 revisions

On this page


Optionals: Thinking Semantic Web Instead of Relational

Beware the Union Minus

SubClasses and SubProperties

When a class is added to the query canvas, any queries generated will search for that class or any of its sub-classes.

Likewise, when a property is used (an edge Object Property or a Data Property), any queries generated will search for that property or any sub-properties. For a sub-property to be considered, it must also have a Domain of a super-class* or sub-class* of the subject class.

To query specific sets of sub-properties or sub-classes, use the union query.

Unions

SPARQLgraph can be used to generate union queries. In general, a union query is an "OR" where expressions share a set of sparqlId's. Creation of union queries in SPARQLgraph is based on the following:

  • create branch points for the union in a property or class, these will be marked with a colored Union symbol U
  • subgraphs beyond the branch point will be shown in the matching color
  • inside unions, sparqlId restrictions are loosened so that items from each branch can refer to the same return value
  • unions may be nested

The creation of union queries will be demonstrated with a brief tutorial using an ontology with a simple battery containing multiple colored cells. Consider these five batteries, each with up to four cells:

description batt_ID cell1_date cell1_ID cell1_color cell2_ID cell2_color cell3_id cell3_color cell4_id cell4_color
normal battery battAA 2017-03-23T10:23:00 A red B blue C white D white
normal battery battAB 2017-03-23T10:24:00 E red F blue G white H white
no date battAC I red J blue K white L white
no colors on cells battX 2017-03-23T10:26:00 M N O P
two cells battY 2017-03-23T10:27:00 Q blue R blue

Union on object properties

Consider the following query:

Find all cells with color "red" OR with no color at all.
Return the cellId, along with the battery id and name.

Such a query looks like this in SPARQLgraph:

and is built with the following steps:

  1. create a Battery, and set ?id and ?name to be returned
  2. add a Cell, and set ?cellId to be returned
  3. add a color, and constrain it to "red". Using the "suggest values" button is helpful here. Also, remember to uncheck the "return" box so the color is not returned.
  4. create a union by selecting the "cell" arc and choosing "new union" off the "opt/minus/union" menu. At this point your subgraph will be rendered with a unique color, and the arc will be marked with a U
  5. add another Cell to the Battery.
  6. add to the union by selecting the new "cell" arc and choosing the "cell" union off the "opt/minus/union" menu. Now that this subgraph is added to the union, go back to the new Cell and return ?cellId making sure to use the same "cellId" sparqlId as the Cell in step 2.
  7. add a color to the new cell, and select the new color arc and choose "minus" off the "opt/minus/union" menu.

You now have a union with two subgraphs. The top subgraph matches all cells with color red. The bottom subgraph matches all cells with no color. The "?cellId" sparqlId is shared between the branches. To make it easy to inspect results, order by "cellId".

The following SPARQL is generated:

prefix ...
select distinct ?id ?name ?cellId
		FROM <http://your/graph>
 where {
	?Battery a ?Battery_type .
	?Battery_type  rdfs:subClassOf* batterydemo:Battery.
	?Battery batterydemo:id ?id .
	?Battery batterydemo:name ?name .
	{
		?Battery batterydemo:cell ?Cell_1 .
			BIND(?Cell_1 as ?Cell) .
			?Cell_1 batterydemo:cellId ?cellId1 .
			BIND(?cellId1 as ?cellId) .
			?Cell_1 batterydemo:color ?Color_1 .
				FILTER ( ?Color_1 IN (<http://kdl.ge.com/batterydemo#red> ) ) . 
	}
	 UNION 
	{
		?Battery batterydemo:cell ?Cell .
			?Cell batterydemo:cellId ?cellId .
			minus {
				?Cell batterydemo:color ?Color .
			}
	}
}
ORDER BY ?cellId

Note that under the hood, each item in the graph has a unique identifier. BIND statements are used to match ?cellId between the two subgraphs in the UNION.

This query returns all the red, and colorless cells:

id name cellId
battAA normal battery A
battAB normal battery E
battAC no date I
battX no colors on cells M
battX no colors on cells N
battX no colors on cells O
battX no colors on cells P

Union on two data properties

For the sake of illustration, consider this query:

Find all cells with the letter 'y' in the id or in the name.
Return the cells' ids and names.

Such a query looks like this in SPARQLgraph:

and is built with the following steps:

  • Add the Battery node to the nodegroup
  • Select id:
    • choose 'new union' from the menu
    • apply the filter FILTER regex(?id, "[Yy]")
  • Select name:
    • choose 'id' union from the menu
    • apply the filter FILTER regex(?name, "[Yy]")

You now have a union query that will return names and ids of all batteries that have the letter 'y' in the name or id.

The query will look like this:

prefix ...
select distinct ?id ?name
		FROM <http://your/graph>
 where {
	?Battery a ?Battery_type .
	?Battery_type  rdfs:subClassOf* batterydemo:Battery.
	{
		?Battery batterydemo:id ?id .
			FILTER regex(?id, "[Yy]")   .
	}
	 UNION 
	{
		?Battery batterydemo:name ?name .
			FILTER regex(?name, "[Yy]") .
	}
}

and, given the data shown in the table above, will return the results:

id name
normal battery
battY

Union on two separate subgraphs

Now consider this query:

Find the id that belongs to any battery OR any blue cell

This query is the union of two disconnected subgraphs. It will look like this:

and is built with the following steps:

  • Add the Battery node to the nodegroup
    • return the ?id
    • open the class URI and select "new union", and de-selecting "return"
  • drag a Cell node, such that it is disconnected
    • return the cellId as ?id
    • open the class URI and select the "?Battery" union, and de-select "return"
  • Add a Color to the Cell, and constrain it to "blue", de-selecting "return"

This results in a query that is the union of the two subgraphs, each of which returns something for ?id.

The query looks like this:

prefix ...
select distinct ?id
		FROM <http://your/graph>
 where {
	{
		?Cell a batterydemo:Cell .
		?Cell batterydemo:cellId ?id_0 .
		BIND(?id_0 as ?id) .
		?Cell batterydemo:color ?Color .
			FILTER ( ?Color IN (<http://kdl.ge.com/batterydemo#blue> ) ) . 
	}
	 UNION 
	{
		?Battery a ?Battery_type .
		?Battery_type  rdfs:subClassOf* batterydemo:Battery.
		?Battery batterydemo:id ?id .
	}
}

and it returns the id of every battery and every blue cell:

id
F
B
J
R
Q
battAB
battAA
battX
battAC
battY

Combining UNION with MINUS

Consider the query "Cat named fluffy OR Cat does not have a kitty".

It is may be tempting to create a Cat and do a UNION on FILTER (?name, "fluffy") and MINUS hasKitty. That is, a union of a data property and MINUS an object property.

SemTK would create SPARQL like this:

?Cat a namespace:Cat
{
   ?Cat namespace:name ?name.
   FILTER regex (?name, "fluffy").
} UNION {
   MINUS { ?Cat namespace:hasKitty ?Kitty  }
}

And given the W3C recommenadation, since the MINUS clause has no left-hand side, it will always succeed. This query will return all cats.

Instead, build the Union on two separate subgraphs. Once both Cat nodes are added to the union, they can both be named ?Cat and their name can both be named ?name. The ?Cat which is a single node holds the ?name with the FILTER regex (?name, "fluffy").

This will generate SPARQL like this:

{
    ?Cat a AnimalSubProps:Cat .
    ?Cat AnimalSubProps:name ?name .
    minus {
        ?Cat AnimalSubProps:hasKitties ?Kitty .
    }
} UNION {
    ?Cat a AnimalSubProps:Cat .
    ?Cat_1 AnimalSubProps:name ?name .
    FILTER regex(?name, "fluffy") .
}

And this will return all cats named "fluffy" plus all cats which do not have kitties.

Construct Queries

CONSTRUCT queries return results in graph form instead of table, thus taking full advantage of the semantic web stack. This type of query is accessed by setting the query dropdown (highlighted below in yellow) to construct.

Rules for building CONSTRUCT queries:

  • any node and edge shown on the canvas are constructed
  • any data properties selected for return are constructed
  • any constraints are applied in the query WHERE clause

graphical results

Hovering the mouse over a node will show:

  • the URI of any class node
  • the type of any data

Results are interactive:

  • Double-click or selecting a node and hitting the Expand button will add all one-hop connections to the display
  • Selecting a node and hitting the Remove button will remove a node from the display only (this is NOT a delete query!)

JSON-LD results

A download link "results.json," which will download a file in JSON-LD format.

Note that different triplestores have been observed to interpret the JSON-LD format differently:

  • a link to another object may be of the form { "@id": "ID123" } or just the string "ID123"
  • data properties may be typed { @value: "35", @type: integer } or may be strings "35"
  • types and URIs may be prefixed in full "uri://my/prefix#uri123" or abbreviated based on query prefixes "prefix:uri123"

The SPARQLgraph interface attempts to resolve these differences and show a standard network format.

Recursive Subtree Query

A recursive construct query can be used to query a particular sub-tree of instance data. This query demonstrates how to construct the tree of the cat "grannymom" (shown above) and all of her descendants.

Start with a construct query that has three Cat nodes connected by the hasKitties predicate:

  • the target: grannymom
  • a generic parent ?Cat_Parent
  • a generic child ?Cat_child

The target Cat node should be set up such that it is not constructed, and that the name matches "grannymom". This is accomplished by selecting the ?Cat field and unclicking the "construct" field in the dialog:

Then click on ?name, select it for return, and set the filter to "grannymom":

The target node now matches the correct parent, and it will not be constructed. Now complete the following steps:

  • set the Cat's outgoing hasKitties to have the qualifier *. This ensures that any ?Cat_Parent that has 0 or more hasKitties relationships back to "grannymom" will be constructed
  • select ?Cat_Parent's name field to be returned/constructed by choosing the "select" checkbox
  • set ?Cat_Parent's outgoing hasKitties to optional. This ensures that a descendent of "grannymom" will be constructed even if it has no hasKitties
  • set ?Cat_child name field to be returned/constructed

The resulting query will construct a tree of "grannymom," all her descendants, and their names.

Delete Queries

DELETE queries work differently from CONSTRUCT queries, in that

  • any node and edge shown on the canvas are added to the WHERE clause
  • any data properties selected for return are added to the WHERE clause
  • items to be deleted must be explicitly specified

Specifying items to delete

Data properties and object property edge dialogs have select for delete check boxes

Node dialogs (accessed by clicking on the class name) contain a menu with a choice of delete modes:

  • NO_DELETE
  • TYPE_INFO_ONLY - only delete type triples with this node's matching URIs as the subject
  • FULL_DELETE - delete all triples with this node's matching URIs in the subject or object
  • LIMITED_TO_MODEL - like FULL_DELETE, but limited to relationships specified in the model
  • LIMITED_TO_NODEGROUP - like FULL_DELETE but limited only to relationships in the nodegroup
FULL_DELETE on nodes is by far the most common type of delete query

Optimizations Internal

SemTK attempts to optimize queries based on performance testing of different triplestores.

VALUES clauses vs FILTER IN

This is used in ingestion URILookups, which can be several queries per row of ingestion data. Hence this can have a very large performance impact.

  • FILTER IN is preferred for AWS Neptune
  • other triples stores are more performant with VALUES clause

rdfs:subclassOf*

This is a very common query clause since a node in a nodegroup typically matches all subclasses.

  • Blazegraph peforms best with rdfs:subclassOf*
  • other triple stores are more performant with a list of classes in a VALUES clause
SPARQLgraph
Clone this wiki locally