stored procedures to create & manage data quality flags.
- Copy the jar into Neo4j's "plugins" directory :
$NEO4J_HOME/plugins
- For security reasons, procedures that use internal APIs are disabled by default. They can be enabled by specifying config in
$NEO4J_HOME/conf/neo4j.conf
e.g.dbms.security.procedures.unrestricted=neo4j.dq.*
- Restart Neo4j
- flag : modelled as a
DQ_Flag
node, linked to a data node with aHAS_DQ_FLAG
relationship, representing a data quality issue affecting that node. - class : to help organize flags, they're given a class, which is part of a class hierarchy. The flag class is modelled as an extra node label on the flag, as well as a separate
DQ_Class
node linked to the flag with aHAS_DQ_CLASS
relationship. Classes in the hierarchy are linked to their children/parent classes with aHAS_DQ_CLASS
relationship. - attachment : to provide more context to a DQ flag, one can attach other nodes to it (beyond the node it already links to). For example, a flag could represent a data mismatch between 2 nodes, in which case it can be useful to attach the second node to the flag.
The following procedures are exposed :
- neo4j.dq.createFlag
- neo4j.dq.attachToFlag
- neo4j.dq.deleteFlag
- neo4j.dq.deleteNodeFlags
- neo4j.dq.listFlags
- neo4j.dq.listClasses
- neo4j.dq.createClass
- neo4j.dq.deleteClass
- neo4j.dq.statistics
Creates a "data quality flag" node for the given data node.
CALL neo4j.dq.createFlag(node, label, description)
- node (
Node
|id) : the data node to flag. - label (String) : label for the flag node, used to categorise the type of data quality issue. Also used for the "DQ_Class" node. Optional (defaults to "Generic_Flag").
- description (string) : property of the flag node. Optional (defaults to "").
- Creates a
(flag:DQ_Flag:_label_)
node, with the provideddescription=_description_
as property, and the relationship(_node_)-[:HAS_DQ_FLAG]->(flag)
. - May also create, if it doesn't exist already, a
(class:DQ_Class)
node, withclass=_label_
as property, and the relationship(flag)-[:HAS_DQ_CLASS]->(class)
. - If the class node doesn't exist, it is created as a child of the root class node :
(class)-[:HAS_DQ_CLASS]->(root)
. - Returns the created flag node
Flag all Node
nodes that are missing a "state" property :
MATCH (n:Node) WHERE NOT EXISTS(n.state)
CALL neo4j.dq.createFlag(n, 'MissingState', 'nodes of type Node should have a state property') yield flag
RETURN flag
Attach a data node to a flag with a HAS_ATTACHMENT
relationship.
CALL neo4j.dq.attachToFlag(flagNode, node| list of nodes)
- flag (
Node
|id) : the flag node to attach to - attachmentNode (
Node
|id) : the data node to attach - description (String) : property of the
HAS_ATTACHMENT
relationship. Optional (defaults to "").
- Creates a relationship
(_flag_)-[:HAS_ATTACHMENT]->(_node_)
, withdescription=_description_
as property of the relationship. - Returns the
HAS_ATTACHMENT
relationship created.
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE NOT EXISTS(r.roles)
CALL neo4j.dq.createFlag(p, 'MissingRoles', 'Missing : ' + p.name+ ' in ' +m.title) yield flag
CALL neo4j.dq.attachToFlag(flag, m) YIELD attachment
RETURN flag, attachment
Deletes flag nodes.
CALL neo4j.dq.deleteFlags(flags)
- flags (ANY:
Node
|[Node
]|id|[ids]) : Node or list of nodes (or its/their ids) of the flag(s) to delete. Any node passed in that's not aDQ_Flag
, will be ignored. - batchSize (Long) : Size of transaction batches for deletions. Optional (defaults to 1).
- Performs a DETACH DELETE of the provided flag nodes, batched in several transactions.
- Returns the number of deleted flags.
Deletes all flag nodes linked to a data node.
CALL neo4j.dq.deleteNodeFlags(nodes)
- nodes (ANY:
Node
|[Node
]|id|[ids]) : Node or list of nodes (or its/their ids) whose linked flags must be deleted. - batchSize (Long) : Size of transaction batches for deletions. Optional (defaults to 1).
- Performs a DETACH DELETE of the flag nodes, batched in several transactions.
- Returns the number of deleted flags.
Lists flag nodes.
CALL neo4j.dq.listFlags(filter)
- filter (String) : flag class for filtering results. Optional (defaults to returning all flags).
Returns the flag nodes.
List DQ classes.
CALL neo4j.dq.listClasses(filter)
- filter (String) : flag class for filtering results. Optional (defaults to returning all classes).
Returns DQ_Class
nodes.
Creates a new DQ class.
CALL neo4j.dq.createClass(class, parentClass, alertTriggerLimit, description)
- class (String) : Name of the DQ class.
- parentClass (String) : Parent DQ_Class. Optional (defaults to "all", the root of the class hierarchy).
- alertTriggerLimit (Long) : Limit above which the count of children flags should trigger an alert (Not yet implemented). Optional (defaults to -1).
- description (String) : property of the class node. Optional (defaults to "").
Returns the created class node.
Deletes a class and all its flags.
CALL neo4j.dq.deleteClass(class)
- class (String) : Name of the DQ class.
- Deletes all the children flags of that class.
- Any child class is kept, and re-attached to the root class.
- Deletes the class node.
- Returns the number of deleted flags.
Computes statistics about DQ flags in the graph.
CALL neo4j.dq.statistics(filter)
- filter (String) : class name for which to compute statistics.
Returns the counts of number of direct/indirect/total children flags for the class