Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCO semi-open vocabularies need to become enumerations of xsd:strings #629

Open
6 of 15 tasks
ajnelson-nist opened this issue Aug 13, 2024 · 2 comments · May be fixed by #630 or #650
Open
6 of 15 tasks

UCO semi-open vocabularies need to become enumerations of xsd:strings #629

ajnelson-nist opened this issue Aug 13, 2024 · 2 comments · May be fixed by #630 or #650

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Aug 13, 2024

Background

UCO gives sets of strings that suggest values to use for certain properties. UCO has called this the "Semi-open vocabulary" design pattern: While certain strings should be used, strings not included in UCO's ontology files can still be used instead when the user knows the value they are using is not yet in UCO, and/or might never be in UCO for whatever reason (e.g., confidentiality of non-public vocabularies).

UCO uses a syntax for its semi-open vocabularies that uses what has appeared to date to be an OWL design pattern for defining a datatype as a set of typed string values. However, it turns out the pattern used by UCO is invalid OWL syntax. Issue 593 covers half of the issues raised by an externally-developed OWL syntax reviewing tool, ROBOT1. Issue 593's correction has no known impact on user data.

The other half of the syntax correction takes this form, which unfortunately differs from UCO's user data guidance since UCO's earliest examples. Given the known non-0 impact on user data, this is being treated as a non-fast-track proposal.

Here is (again, selected for its small size) the Bitness vocabulary, as implemented in UCO 1.3.0 and long before:

vocabulary:BitnessVocab
	a rdfs:Datatype ;
	rdfs:label "Bitness Vocabulary"@en-US ;
	rdfs:comment "Defines an open-vocabulary of word sizes that define classes of operating systems."@en ;
	owl:equivalentClass [
		a rdfs:Datatype ;
		owl:onDatatype xsd:string ;
		owl:oneOf (
			"32"^^vocabulary:BitnessVocab
			"64"^^vocabulary:BitnessVocab
		) ;
	] ;
	.

Issue 593 cut the triple [] owl:onDatatype xsd:string .. This Issue finishes the OWL syntax correction by changing the members to values typed only as xsd:string (which is the OWL-wide (IIRC, RDF-wide) default when no other type is stated):

vocabulary:BitnessVocab
        a rdfs:Datatype ;
        rdfs:label "Bitness Vocabulary"@en-US ;
        rdfs:comment "Defines an open-vocabulary of word sizes that define classes of operating systems."@en ;
        owl:equivalentClass [
                a rdfs:Datatype ;
                owl:onDatatype xsd:string ;
                owl:oneOf (
                        "32"
                        "64"
                ) ;
        ] ;
        .

The syntax error this addresses is that the enumeration attempted to assign a datatype to the enumeration's member-literals; but, that datatype is not yet defined at the time the enumeration is being consumed by the parsing process, because they are inside the referenced datatype's definition. ROBOT describes this as a cyclic reference in a definition, and calls it a syntax error. The "Bug description" section below goes into the details describing why the cycle is also a symptom of another syntax error.

Requirements

Requirement 1

UCO vocabularies must adhere to OWL syntactic requirements for custom datatypes.

Requirement 2

Any UCO vocabulary X must not use itself as a datatype in its enumerated literals, due to OWL syntax requirements (cited in "Bug description" section).

Risk / Benefit analysis

Benefits

  • Presently, OWL tools using the OWL API library1 declare UCO as having syntax errors (one per vocabulary) using the cyclic-reference pattern. (The error's call stack ends in this method.) From (offline) testing the patch with ROBOT, this change proposal addresses that error type, removing a barrier to compatibility for other OWL-based ontologies that might adopt UCO.
  • Using plain xsd:string in vocabularies resolves a blocking design question on JSON-LD context dictionaries, on how to assign datatypes to semi-open vocabulary properties without inducing SHACL errors when a value outside the suggested set is used.
  • This reduces programming languages' complexities in assigning datatypes to certain properties only when certain values are used. (This is a concern downstream from the UCO ontology, and applies to e.g. Python bindings.)

Risks

  • All CASE and UCO data using semi-open vocabularies would need to revert vocabulary assignments to simple strings. This would be a backwards-incompatible change because 1.3.0-and-prior data would be flagged as erroneous, but this change can be implemented in stages with the warning pattern used for other proposals with pre-2.0.0 and post-2.0.0 effects.

Competencies demonstrated

Competency 1

A hash is being migrated from UCO 1.3.0 to UCO 1.4.0. The form in UCO 1.3.0 is:

{
    "@context": {
        "kb": "http://example.org/kb/",
        "types": "https://ontology.unifiedcyberontology.org/uco/types/",
        "vocabulary": "https://ontology.unifiedcyberontology.org/uco/types/"
    },
    "@graph": {
        "@id": "kb:Hash-9ed55c42-3204-4e45-ace7-7ae6bd7d8f38",
        "@type": "types:Hash",
        "types:hashMethod": {
            "@type": "vocabulary:HashNameVocab",
            "@value": "SHA3-256"
        },
        "types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80"
        }
    }
}

Competency Question 1.1

How is the hash supposed to be spelled with this proposal?

Result 1.1

Note that two lines are dropped: The "vocabulary" prefix from the context dictionary will never be needed by users, and hashMethod no longer needs the typed-literal JSON object in its spelling.

{
    "@context": {
        "kb": "http://example.org/kb/",
        "types": "https://ontology.unifiedcyberontology.org/uco/types/"
    },
    "@graph": {
        "@id": "kb:Hash-9ed55c42-3204-4e45-ace7-7ae6bd7d8f38",
        "@type": "types:Hash",
        "types:hashMethod": "SHA3-256",
        "types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80"
        }
    }
}

Competency Question 1.2

How is a hash using an algorithm name not in UCO supposed to be spelled?

Result 1.2

# UCO 1.4.0 form
kb:Hash-9ed55c42-3204-4e45-ace7-7ae6bd7d8f38
	a types:Hash ;
	types:hashMethod "FooHash" ;
	types:hashValue "098f6bcd4621d373cade4e832627b4f6"^^xsd:hexBinary ;
	.

Competency Question 1.3

How should semi-open vocabulary properties' ranges be spelled?

Result 1.3

These properties' ranges should continue to be spelled as the union of xsd:string and the UCO Vocabulary IRI. The OWL 2 Syntax document, Section 9.4 gives an example for a restricted string-property. UCO's semi-open vocabulary still permits general xsd:string, so an OWL Union'd range should be used in owl:DatatypePropertys, and vocabulary membership should be reviewed with SHACL.

Solution suggestion

No changes are necessary to the owl:DatatypeProperty OWL definitions currently using the semi-open vocabulary pattern, taking for example types:hashMethod:

types:hashMethod
	a owl:DatatypeProperty ;
	rdfs:label "hashMethod"@en ;
	rdfs:comment "A particular cryptographic hashing method (e.g., MD5)."@en ;
	rdfs:range [
		a rdfs:Datatype ;
		owl:unionOf (
			vocabulary:HashNameVocab
			xsd:string
		) ;
	] ;
	.

The SHACL shapes for the semi-open vocabulary pattern have been coming as trios of property shapes. They need to be adjusted as follows for the implementing next UCO SEMVER-minor release, so current data raises SHACL warnings instead of errors. Note that the sh:Info-severity "gentle suggestion" shape is now the shape that bears the enumeration.

types:Hash
	sh:property
		[
			sh:datatype xsd:string ;
			sh:message "As of UCO 1.4.0, the datatype to use for types:hashMethod should be xsd:string.  Not using xsd:string will be an error in UCO 2.0.0." ;
			sh:path types:hashMethod ;
			sh:severity sh:Warning ;
		] ,
		[
			sh:maxCount "1"^^xsd:integer ;
			sh:minCount "1"^^xsd:integer ;
			sh:nodeKind sh:Literal ;
			sh:path types:hashMethod ;
		] ,
		[
			sh:message "Value is not member of the vocabulary HashNameVocab." ;
			sh:in (
				"MD5"
				"MD6"
				"SHA1"
				"SHA224"
				"SHA256"
				"SHA3-224"
				"SHA3-256"
				"SHA3-384"
				"SHA3-512"
				"SHA384"
				"SHA512"
				"SSDEEP"
			) ;
			sh:path types:hashMethod ;
			sh:severity sh:Info ;
		]
		;
	.

(An aside: the new sh:Warning shape makes some new cases that previously failed now pass as warnings, like using an integer for hashMethod's value. Given the warnings, this feels like a low-risk relaxation.)

For UCO 2.0.0, the first two shapes would combine, leaving the implementation as follows.

types:Hash
	sh:property
		[
			sh:datatype xsd:string ;
			sh:maxCount "1"^^xsd:integer ;
			sh:minCount "1"^^xsd:integer ;
			sh:nodeKind sh:Literal ;
			sh:path types:hashMethod ;
		] ,
		[
			sh:message "Value is not member of the vocabulary HashNameVocab." ;
			sh:in (
				"MD5"
				"MD6"
				"SHA1"
				"SHA224"
				"SHA256"
				"SHA3-224"
				"SHA3-256"
				"SHA3-384"
				"SHA3-512"
				"SHA384"
				"SHA512"
				"SSDEEP"
			) ;
			sh:path types:hashMethod ;
			sh:severity sh:Info ;
		]
		;
	.

Due to a processing efficiency issue, one final modification from current practice is suggested: The "Gentle suggestion" property shape should change from an inlined, blank-node-identified sh:PropertyShape to a class-independent, IRI-identified sh:NodeShape that uses the targeting statement ... sh:targetObjectsOf types:hashMethod . The reason for this is that certain SHACL reporting practices, like those used in UCO's unit testing, inline all blank node property shapes in the SHACL result-sets. For some vocabularies, especially longer ones that suggest intensive extension (in particular, the 200+-member vocabulary:ObservableObjectRelationshipVocab), this could be hopelessly obscuring in the result sets whenever users engage in local vocabulary extension.

This pattern is suggested for the implementation instead, to offer users (1) a reference to an IRI they can look up on the documentation site instead of getting an inlined and long anonymous shape, and (2) the opportunity to opt out of any specific "Gentle suggestion" shapes by using sh:deactivate:

types:hashMethod-objects-in-shape
	a sh:NodeShape ;
	sh:in (
		"MD5"
		"MD6"
		"SHA1"
		"SHA224"
		"SHA256"
		"SHA3-224"
		"SHA3-256"
		"SHA3-384"
		"SHA3-512"
		"SHA384"
		"SHA512"
		"SSDEEP"
	) ;
	sh:message "Value is not member of the vocabulary HashNameVocab." ;
	sh:severity sh:Info ;
	sh:targetObjectsOf types:hashMethod ;
	.

The replacement for the class's sh:property links would be as follows, using an rdfs:seeAlso to preserve some reference on the documentation website without any other mechanical impact:

 types:Hash
+	rdfs:seeAlso types:hashMethod-objects-in-shape ;
 	sh:property
 		[
 			sh:datatype xsd:string ;
 			sh:maxCount "1"^^xsd:integer ;
 			sh:minCount "1"^^xsd:integer ;
 			sh:nodeKind sh:Literal ;
 			sh:path types:hashMethod ;
- 		] ,
- 		[
- 			sh:message "Value is not member of the vocabulary HashNameVocab." ;
- 			sh:in (
- 				"MD5"
- 				"MD6"
- 				"SHA1"
- 				"SHA224"
- 				"SHA256"
- 				"SHA3-224"
- 				"SHA3-256"
- 				"SHA3-384"
- 				"SHA3-512"
- 				"SHA384"
- 				"SHA512"
- 				"SSDEEP"
- 			) ;
- 			sh:path types:hashMethod ;
- 			sh:severity sh:Info ;
 		]
 		;
 	.

If a user wishes to opt out of the gentle suggestions, they can add this triple in their extension SHACL shapes:

types:hashMethod-objects-in-shape
	sh:deactivated true ;
	.

Bug description

While at least one tool reports that the current UCO vocabulary pattern is a cyclic definition error, the OWL syntax specification and RDF mapping documents together describe the problem as an inappropriate use of an OWL structure, DatatypeDefinition.

Looking through the OWL 2 Mapping to RDF document, Table 16 includes this mapping:

IF this pattern is in the RDF graph G:

*:x owl:equivalentClass y .
{ DR(*:x) ≠ ε amd DR(y) ≠ ε }

THEN the following axiom is added to the ontology OG:

DatatypeDefinition( DR(*:x) DR(y) )

For quick reference, the smbols above mean:

  • *:x is an IRI
  • y is either a blank node or an IRI
  • DR(z) is a function mapping a node z to a DataRange structre in the ontology OG
    • Nodes for DR can be IRIs or blank nodes
  • DR(z) ≠ ε states that DR(z) is defined (or equivalently, the function at the value z is not empty)

Taken together, owl:equivalentClass as a predicate between two certain nodes pertaining to DataRanges imply a DatatypeDefinition.

*:x being in a DatatypeDefinition imposes a certain constraint on *:x, with which UCO has been incongruous to date. Section 9.4 of the OWL 2 Syntax document defines that in the structure DatatypeDefinition ( DT DR ), the Datatype that is the first parameter (DT, or *:x, or uco-vocabulary:BitnessVocab) is a "synonym for DR..." And: "The datatypes defined by datatype definition axioms ... have empty lexical spaces and therefore they must not occur in literals."

The example in the OWL 2 syntax document that immediately follows that quote illustrates a datatype a:SSN, which is part of a datatype definition that would be spelled like this in Turtle:

# First axiom: "Declaration( Datatype( a:SSN ) )"
a:SSN
	a rdfs:Datatype ;
	.

# Second axiom: "DatatypeDefinition( ..."
a:SSN
	owl:equivalentClass [
		a rdfs:Datatype ;
		owl:onDatatype xsd:string ;
		owl:withRestrictions (
			[
				xsd:pattern "[0-9]{3}-[0-9]{2}-[0-9]{4}" ;
			]
		) ;
	] ;
	.

# Third axiom: "DataPropertyRange( a:hasSSN a:SSN )"
a:hasSSN
	rdfs:range a:SSN ;
	.

Note the end of the example block states "there can be no literals of datatype a:SSN".

In UCO's case, there can be no literals of datatype vocabulary:BitnessVocab, vocabulary:HashNameVocab, etc.

Coordination

  • Tracking in Jira ticket OCUCO-323
  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2024-08-12
  • Requirements to be discussed in OC meeting, 2024-08-20
  • Requirements Review vote occurred, passing, on 2024-08-20
  • Requirements development phase completed.
  • Solution announced to OCs on 2025-02-26
  • Solutions Approval to be discussed in OC meeting, 2025-03-TBD
  • Solutions Approval vote has not occurred
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (or N/A)
  • Milestone linked
  • Documentation logged in pending release page
  • Prerelease publication: CASE develop branch updated to track UCO's updated develop branch
  • Prerelease publication: CASE develop-2.0.0 branch updated to track UCO's updated develop-2.0.0 branch

Footnotes

  1. Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose. 2

@ajnelson-nist
Copy link
Contributor Author

This proposal also has an impact on the implementation of Issue 549, as that issue adds a new vocabulary.

ajnelson-nist added a commit that referenced this issue Aug 13, 2024
A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Aug 13, 2024 that will close this issue
13 tasks
@ajnelson-nist ajnelson-nist linked a pull request Aug 13, 2024 that will close this issue
13 tasks
ajnelson-nist added a commit that referenced this issue Aug 13, 2024
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Oct 17, 2024
A follow-on patch will regenerate Make-managed files.

References:
* #549
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Oct 17, 2024
References:
* #549
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
This partially undoes commit `aea0c04`, because of the introduction of
`sh:targetObjectsOf` usage that would have a broad impact.  This will
return for review later.

A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Feb 18, 2025 that will close this issue
13 tasks
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 18, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Feb 19, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Feb 21, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Feb 21, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Feb 21, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Feb 21, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Feb 21, 2025
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Feb 21, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Feb 21, 2025
While UCO Issue 629 is in progress, the warnings cannot be satisfied in
the data at the same time that CASE 1.3.0's rules are satisfied.

A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Feb 21, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Feb 21, 2025
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Feb 21, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 26, 2025
This patch introduces a name scheme that does not align with a previous
occurrence of this export,
`observable:regionalInternetRegistry-shape-value-not-vocabulary-member`.

This patch also does not apply the exporting pattern to
`core:objectStatus`, because in that case the vocabulary is closed.

A follow-on patch will regenerate Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 26, 2025
References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Feb 26, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Feb 26, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Feb 26, 2025
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Feb 26, 2025
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Feb 26, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Feb 26, 2025
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Feb 26, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Feb 26, 2025
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Feb 26, 2025
References:
* ucoProject/UCO#629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Feb 26, 2025 that will close this issue
13 tasks
ajnelson-nist added a commit that referenced this issue Feb 26, 2025
No effects were observed on Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Feb 26, 2025
No effects were observed on Make-managed files.

References:
* #629

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

The implementation has been completed and tested throughout the CASE example-sets, and an announcement is going out momentarily to the CDO ontologies list for a Solutions Approval vote.

The summary of changes follows, provided for help because the PRs contain many changes, mostly but not all cookie-cutter:

  • (Alternative shape design) There was one minor deviation from the implementation style I'd suggested, on how to shift the sh:in constraints for the sake of reducing validation results output.
    • (Property shape IRI) The new property shape IRI pattern happens to clash with a sole previous occurrence that supported a slightly different use case.
  • The vocabulary alignment unit test has been restructured to account for semi-open and closed vocabularies used in UCO. (CASE's usage employs only the most common pattern in UCO.) Going through and deleting a member from core:ObjectStatusVocab, vocabulary:ContactEmailScopeVocab, vocabulary:TaskPriorityVocab, and vocabulary:TrendVocab, whether on the OWL side or the SHACL side, satisfactorily raised unit test errors.
  • Test result-sets in the CASE examples will be longer until UCO 1.4.0 is released and incorporated into CASE, because of semi-open vocabulary warnings that can't be addressed while 1.3.0 is the current version. Most often, these are pertaining to hash names.
  • (Datatype review) Some datatype-based errors UCO was able to detect for semi-open vocabulary properties will only be able to raise warnings instead of errors, until UCO 2.0.0.
  • (Member assertions) Some vocabulary member errors UCO was able to detect for semi-open vocabulary properties will no longer be possible to flag as errors. They will instead be considered vocabulary extensions.

And last, there is one group double-checking on backwards compatibility. With this bugfix merged:

  • UCO 1.3.0 data will still pass UCO 1.4.0 validation, but raise warnings.
  • UCO 1.4.0 data will pass UCO 1.3.0 validation, but raise more infos (sh:Info-level results).

Notes below describe some of the summarized points a bit further. There are two requests for feedback on code-style points.

Object status

(Feedback request)

core:objectStatus was not given a warning shape for its DatatypeConstraint, due to not having been in a released version of UCO yet. Also, because this vocabulary is short and a closed vocabulary, I did not implement the same pattern motion. I personally do not see a strong justification one way or the other, so if there is a request to align this stylistically with the semi-open vocabularies' sh:PropertyShapes, it seems fine to me to do.

Property shape IRI

(Feedback request)

The pattern of IRIs used for reviewing sh:in is ${Class IRI}-$(Property local name)-in-shape. This was chosen to reflect SHACL structure.

One occurrence of moving sh:in review to its own shape happened for a property shared by two classes, and uses a more English-descriptive IRI, observable:regionalInternetRegistry-shape-value-not-vocabulary-member:

https://github.com/ucoProject/UCO/blob/1.3.0/ontology/uco/observable/observable.ttl#L12601-L12620

If there are requests to go with one IRI pattern over the other, please note so.

Alternative shape design

I'd suggested a pattern for moving the sh:in constraint review into its own node shape, using sh:targetObjectsOf. I tried that, and decided instead to use a property shape, just giving an IRI to the blank node that was a part of the class definition. The reason is node shapes do the same review, but just on the "node" without context of the triple the node is a part of. So, a validation result raised about "SHA1" being used instead of "SHA-1" would look like this ...

# ...
[
	a sh:ValidationResult ;
	sh:focusNode "SHA-1" ;
	sh:resultMessage "Value is not member of the vocabulary HashNameVocab." ;
	sh:resultSeverity sh:Info ;
	sh:sourceConstraintComponent sh:InConstraintComponent ;
	sh:sourceShape types:Hash-hashMethod-in-shape ;
	sh:value "SHA-1" ;
]
# ...

... instead of this:

# ...
[
	a sh:ValidationResult ;
	sh:focusNode <http://example.org/kb/hash-af4b0c85-b042-4e2d-a213-210b3d7f115c> ;  # <-- Needed to find triple
	sh:resultMessage "Value is not member of the vocabulary HashNameVocab." ;
	sh:resultPath types:hashMethod ;  # <-- Needed to find triple
	sh:resultSeverity sh:Info ;
	sh:sourceConstraintComponent sh:InConstraintComponent ;
	sh:sourceShape types:Hash-hashMethod-in-shape ;
	sh:value "SHA-1" ;
]
# Triple can be constructed from sh:focusNode, sh:resultPath, and sh:value
# ...

Moving the sh:in constraints into IRI-identified property shapes significantly compacted validation results. See these highlighted lines that show the effect of moving the types:hashMethod membership review, done here.
To see the compaction results of moving the sh:in shapes into IRI-identified property shapes, see this patch.

Datatype review

Prior DatatypeConstraints that were able to distinguish between xsd:string, a vocabulary datatype, and other datatypes now are limited to xsd:string vs. all other datatypes. This may lead to some XPASS test results for users (i.e., a test expected to fail now passes). For example, there was a test in UCO's suite that flagged a triple $this types:hashMethod 1 as an error, because hashMethod would accept xsd:string or vocabulary:HashNameVocab and could separate out xsd:integer. In order to relax current uses of vocabulary:HashNameVocab to warning-level results, all non-string uses are downgraded from errors to warnings until UCO 2.0.0.

Member assertions

Prior to this proposal, a vocabulary non-member mistakenly declared to be a member could be recognized as a typo and have a validation error raised. With this proposal, they now only raise validation sh:Info-severity results. E.g., "SHA-1" data-typed as a vocabulary member was previously able to be flagged as an error for not being "SHA1". With the change of all vocabulary members to untyped strings, this is no longer possible; "SHA-1" will just look like an extension to the vocabulary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant