-
Notifications
You must be signed in to change notification settings - Fork 6
Data Verification
The RACK tooling provides multiple methods for verifying the data that is loaded into RACK:
-
The
check
tool in the ASSIST toolset-
Example run of check against RACK-in-a-box image (run this locally or within the container/VM):
check -m http://localhost:3030/
Note that the above requires the installation of SWI Prolog. This is already available and installed in the RACK box image, so this command can be run via a
docker exec
command:$ docker container ls CONTAINER ID IMAGE ... NAME 1ab3beb878d7 gehighassurance/rack-box:... festive_einstein ... $ docker exec -it {CONTAINER_ID_or_NAME} /home/ubuntu/RACK/assist/bin/check -m http://localhost:3030/
It is also possible to run it locally if SWI Prolog is installed. The
check
script is written as a Unix script with a shebang line to run theswipl
SWI Prolog command, but it is possible to runcheck
in non-Unix environments by invoking it via swipl directly:swipl -s assist/bin/check -- -m http://localhost:3030/
If you receive a "Connection refused" message when running it locally, check to make sure you are running the RACK box with port 3030 enabled (you can also visit
http://localhost:3030/
in your browser to verify this is available; that URL should serve an "Apache Jena Fuseki" page).See https://github.com/ge-high-assurance/RACK/tree/master/assist#assist-dv----data-verification and https://github.com/ge-high-assurance/RACK/tree/master/assist/bin#command-line-usage for more information.
-
-
SemTK ingestion performs verifications against both the model and nodegroup ingestion rules. Checks data types and qualified cardinality.
-
SemTK cardinality checker - checks cardinality counts: reports/section/cardinality wiki
The SemTK verification is performed when ingesting via the nodegroup.
The check
tool verifications can be run at any time on either local
OWL files or a live RACK database to perform verification on the
existing data.
There are significant overlaps between the two methods: many issues will be detected by either method (see the table below). The SemTK verification is explicitly defined and extensible via creation of another nodegroup query (via the SemTK UI). The check
verification provides both automatic and explicit verification (by editing the assist/bin/checks
files). A common methodology might be for a user exploration process to update or create new query nodegroups, whereas an automated process like CI testing would utilize the check
tool which does not require any user interaction (and nodegroups created by the SemTK process can be migrated into the check
process as they are determined to be viable and desirable verification).
There are two types of verification:
- validity - does the data map to valid ontology elements and have valid values
- consistency - does the data integrate with existing data properly
The following is a summary of the various checks currently available:
Check | Tool | Validity? | Consistency? | Description |
---|---|---|---|---|
missing notes | check | ✅ | Each defined item should have a description of that item. This can be provided via a SADL note field or the corresponding CSV column entry. |
|
ontology basis | check | ✅ | All defined items should have an inheritance from one of the items defined in the PROV-S set of base classes. Items which do not inherit from one of these are likely to be data islands and not integrated properly with the rest of the data. |
|
instance types | check | ✅ | Verifies that an object isn't declared to be an instance of multiple separate ontology classes. At present, the RACK ontology does not utilize multiple inheritance (although RDF itself allows this). | |
cardinality | check, semtk cardinality | ✅ | ✅ | Verifies that object property relations conform to the cardinality restrictions defined in the ontology (e.g. must be one, must be more than 0, etc.). In RDF/OWL terms, a "Restriction". |
multiple optional | check | ✅ | ✅ | Verifies that object properties marked as optional are either not specified or specified only once (conceptually a subset of cardinality, but in RDF/OWL terms, a "FunctionalProperty"). |
invalid enum value | check, ingest | ✅ | Specification of a property value that is not one of the valid enumerated set of values. | |
wrong type | check, ingest | ✅ | Providing a value of the wrong type, based on the ontology (e.g. specifying a string where an object is needed). | |
value range exceeded | check, ingest | ✅ | Specification of a value outside the defined range for values of that property (when defined). |
Semtk ingest
templates:
- optionally contain validation steps that are independent of the model, such as non-empty columns. See data validation
- translate input strings into typed values using rules explained at ingestion type handling
Additional checks may be provided in future versions of RACK.
Copyright (c) 2021-2024, General Electric Company, Galois, Inc.
All Rights Reserved
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-20-C-0203.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA).
Distribution Statement "A" (Approved for Public Release, Distribution Unlimited)