Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to correctly create/preserve cardinality constraints on ObjectProperties using template? #1133

Closed
andreas-w-m opened this issue Aug 1, 2023 · 14 comments · Fixed by #1104

Comments

@andreas-w-m
Copy link

Dear ROBOT team,

We need to create OWL files from UML class diagrams, which contain all taxonomy, partonomy and topological relationships with inverses and cardinalities. The UML tool generates Excel files, which can easily processed to be usable for ROBOT template. Hence, we use a staged process, which

  1. creates the classes via a simple "ID | SC %" template, followed by
  2. creating the ObjectProperties with their domains/ranges via a template "ID | TYPE | DOMAIN | RANGE".

Now, since the UML-to-Excel export contains also distinct columns for the specified min/max cardinalities, these shall be also properly set in a third step, so that restrictions like

Class: <Process>
SubClassOf: 
   (<contains> min 1 <Task>) and (<contains> max 5 <Task>)

are correctly inserted. In this example, both "Process" and "contains" have been successfully created als Class and ObjectProperty, respectively, in steps 1 and 2.

However, there are several issues with the third step:

  • In the UML-to-Excel export there are cases where the same ObjectProperty comes in multiple rows with identical domain but different ranges. Hence, in such cases the range should become a union, however, ROBOT template always creates intersections. Is there a way to change this behavior?
  • Even when using manually created subclass axioms in the template, e.g., "(contains min 1 Task)", attempting to merge this with the previously created valid outputs from steps 1 and 2 either with template or with merge in any combination always results in "Task" being redeclared as a Datatype instead of the Class it has actually been created as before (the owl:Restriction then gives an owl:onDataRange instead of owl:onClass). Protégé also gives a punning warning here. How can this be avoided, so that the previous class definitions and object properties are correctly used?
  • Is there a way to set up a template that permits directly using the numerical contents for the cardinalities in creating the correct Manchester syntax for the restrictions and involved entities, e.g. from Excel "From-Class | Restricted-Property | To-Class | Min-Card. | Max-Card."?

Thank you in advance!

@matentzn
Copy link
Contributor

matentzn commented Aug 2, 2023

Hello @andreas-w-m, thank you for your issue - just FYI, the ROBOT team is mostly on a summer break atm, and it may be a while before you get a response, possibly early September. Would you be able to knock again around then?

@andreas-w-m
Copy link
Author

Hello @matentzn - just knocking, since the issue is still relevant.

@matentzn
Copy link
Contributor

@andreas-w-m Unfortunately, we are all snowed under massively at the moment.

@dlutz2 has recently submitted #1148. Maybe he has some ideas? I will share your issue again with the community.

@dlutz2
Copy link
Contributor

dlutz2 commented Sep 20, 2023

Just had time for a quick glance:
"... the range should become a union, however, ROBOT template always creates intersections." - I believe that the OWL specs says that multiple range statements are interpreted as a intersection of those ranges. Template will create a separate range axiom for each row in your step 2. If those ranges vary, then you get the intersection of all the distinct ranges seen for that property. I think you will have to, in the process that creates the step 2 template, aggregate those values into a union or pick a common parent.

On the creation of the Datatype - The class expression <property> min 1 <entity> is ambiguous unless the type of property and/or entity is already known (could be either a data or object minimum cardinality) . It would appear that template (or actually the underlying Manchester parser) does not know the types of property and entity so it is somewhat arbitrarily picking the data interpretation. I assume that in the step 3 template command, you are including "--input step1.owl --input step2.owl" which will provide the contains and Task definitions? Does the contains property get punned as an data property as well as Task as a Datatype? Trying to merge than downstream won't work since the data interpretation has already been selected and merging will result in puns.
Although somewhat messy, a single template with all entity definitions (property and class with all subclass definitions) would ensure that all entity types are available to the parser in a single pass and might resolve the ambiguity.

On the last part, I don't think template has a way to specify the individual elements of a cardinality restriction in separate columns. We've been favoring writing the Manchester in a preprocessing step and putting it into a SC/EC column rather than trying to add more complex column combinations to template. We found we had more control over the class expressions without having to modify template itself.

@andreas-w-m
Copy link
Author

andreas-w-m commented Sep 20, 2023 via email

@andreas-w-m
Copy link
Author

andreas-w-m commented Oct 6, 2023

@dlutz2, @matentzn : Unfortunately I would like to come back to the Datatype creation problem again: Even using a single template with all definitions contained does not work. This is reproducible:
When using as a simple templatefile.csv

Identifier,Type,Class Restriction Lower,Class Restriction Upper,Domain,Range
ID,TYPE,SC %,SC %,DOMAIN,RANGE
ex:consists_of,owl:ObjectProperty,,,ex:Workflow,ex:Task
ex:Workflow,class,(ex:consists_of min 1 ex:Task),(ex:consists_of max 7 ex:Task),,
ex:Task,class,,,,

and running
robot template --template templatefile.csv --output test-CR.ttl --prefix "ex: http://example.com/model#"
the resulting test-CR.ttl will always contain the owl:onDataRange instead of the required owl:onClass, resulting in half-correct restrictions,


<http://example.com/model#consists_of> rdf:type owl:ObjectProperty ;
                                       rdfs:domain <http://example.com/model#Workflow> ;
                                       rdfs:range <http://example.com/model#Task> .

<http://example.com/model#Task> rdf:type owl:Class .

<http://example.com/model#Workflow> rdf:type owl:Class ;
                                    rdfs:subClassOf [ rdf:type owl:Restriction ;
                                                      owl:onProperty <http://example.com/model#consists_of> ;
                                                      owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ;
                                                      owl:onDataRange <http://example.com/model#Task>
                                                    ] ,
                                                    [ rdf:type owl:Restriction ;
                                                      owl:onProperty <http://example.com/model#consists_of> ;
                                                      owl:maxQualifiedCardinality "7"^^xsd:nonNegativeInteger ;
                                                      owl:onDataRange <http://example.com/model#Task>
                                                    ] .

which then appear in Protégé as error:

grafik

Obviously, there seems to be something going wrong with correctly resolving the Manchester - or am I missing something here?

@dlutz2
Copy link
Contributor

dlutz2 commented Oct 6, 2023

Didn't have time to look in depth but it appears that:

  • Template correctly interprets the first row, creating the consists_of as an object property
  • When processing the second row, the Quoted Entity Checker doesn't recognize consists_of as an object property and creates a data property with the same IRI,
  • this data property is now used to create the 2 data cardinality restrictions which makes Task a data range (it hasn't been defined yet)
  • Template lastly creates Task as a Class
  • The ontology now contains an illegal object-data property pun but the data property type is implicit (buried in the cardinality restrictions). Protege eventually tries to parse these restrictions (after seeing the declarations that Task is a Class and consists_of is an object property) and gives up trying to interpret the restrictions, creating "ErrorNNN" classes instead.
    The issues stems from that second point where the Entity Checker is not recognizing the consists_of property and creating a data property. When I looked at the contents of the Entity Checker after creating the ontology:
 out = template.generateOutputOntology();
 QuotedEntityChecker after = template.getChecker();

Task, Workflow and consists_of were all in the data properties map and the other maps were empty. That seems odd but
someone who is more familiar with the Template - Entity Checker behavior might be able diagnose this further.

@andreas-w-m
Copy link
Author

I can add that this odd behavior occurs regardless of the line ordering in the templatefile.csv, and even adding a further line to make sure all involved classes are explicitly defined before using them, i.e.,

Identifier,Type,Class Restriction Lower,Class Restriction Upper,Domain,Range
ID,TYPE,SC %,SC %,DOMAIN,RANGE
ex:Task,class,,,,
ex:Workflow,class,,,,
ex:consists_of,owl:ObjectProperty,,,ex:Workflow,ex:Task
ex:Workflow,class,(ex:consists_of min 1 ex:Task),(ex:consists_of max 7 ex:Task),,

(and again varying the line ordering) doesn't change the output. The ex:Task will always be interpreted as a datatype despite even being explicitly defined as a class.

I am wondering if the cause might be related to something in the direction of #1105?

@dlutz2
Copy link
Contributor

dlutz2 commented Oct 7, 2023

I believe it is the same issue as #1105 and the section @matentzn marked . The behavior of the Default Entity Checker (which is what the Manchester Parser likely assumes) is to return a known entity or null if unknown:

        @Override
        public OWLDataProperty getOWLDataProperty(String name) {
            if (dataPropertyNames.contains(name)) {
                return dataFactory.getOWLDataProperty(getIRI(name));
            }
            return null;
        }

The Quoted Entity Checker adds logic to prevent punning by returning a known entity or creating one if unknown which assumes that the first property type it checks for is the correct entity type (which happens to be data property). Perhaps it is not the Checker's role to prevent punning (or even Template's)? If so, it would be easier to check the resulting finished and merged ontology for puns rather that trying to spot and prevent them during its creation. Perhaps just remove the pun-prevention logic in both getOWLObjectProperty and getOWLDataProperty?

@andreas-w-m
Copy link
Author

andreas-w-m commented Oct 8, 2023

I am wondering (without being familiar with the ROBOT code, so maybe this is already being done) whether it might make sense to extent the context scope of the pun-prevention logic during the Template run, and to exit the run with a warning regardless of any -v switches if punning could not be prevented:

  • If an entity encountered during parsing is undefined then check if there is more context available from an <S, P, O> perspective to derive the respective entity type, e.g., for TBox templates:
    • If the S is known to be a class AND the P is known to be an ObjectProperty THEN O is assumed to be a class as well.
    • If the S is known to be a class AND the P is known to be a DataProperty THEN O is assumed to be a datatype.
    • If the P is known to be an ObjectProperty THEN both S and O are assumed to be classes.
    • If the P is known to be a DataProperty THEN S is assumed to be a class and O to be a datatype.
    • If the O is known to be a datatype THEN S and P are assumed to be a class and DataProperty, respectively.
    • If the O is known to be a class THEN S and P are assumed to be a class and ObjectProperty, respectively.
    • etc.
  • If there are multiple inputs/templates to be considered for resolving then
    • First try to resolve from the --template file alone. If that fails then
    • Try to resolve from any of the --input files.
    • If there are multiple, possibly conflicting, definitions in either the --template and/or the --input then assume the first definition encountered during reading to be the correct one and continue with that, but exit with a warning/info about this assumption. This would, of course, impose an importance on the order of --input statements.
    • If redefinitions of previously valid assertions occur during the run and would subsequently introduce punning then raise at least a warning.

I have also noticed that the behavior within a pipeline differs as one could use the same multiple input files within one template call with --merge-before to generate a merged ontology or use one template call followed by a merge call. In one case the ERRORnnn classes (incorrect restrictions only) appear in Protégé, as only the owl:onDataRange is generated but the restricted property remains the intended ObjectProperty. In the other case there are full double definitions included in the output .ttl, i.e., a property would appear as both ObjectProperty and DataProperty in the merged ontology, and hence Protégé would not show ERRORnnn classes (since there is a corresponding DataProperty available), but in any case there will be warnings on punning shown in the Protégé log.

@dlutz2
Copy link
Contributor

dlutz2 commented Oct 8, 2023

If you look in the parser libraries inside OWLAPI (e.g. https://github.com/owlcs/owlapi/blob/version5/parsers/src/main/java/org/semanticweb/owlapi/rdf/rdfxml/parser/OWLRDFConsumer.java#L843, look for updateGuesses) you'll see the logic they use to both detect potential puns and to reconcile ambiguous references during parsing. Quite a bit of code and it still can't guarantee to catch all puns (illegal or not) or to correctly disambiguate incoming IRIs.
If we assume that illegal puns are errors, rare and hard to spot, it might be best to check for them after the ontology is assembled, likely after any merging has been done. There you could just call OWLAPI methods on the finished ontology and react accordingly. Or you can just ignore it and assume something downstream will do something about it. 😄

@jamesaoverton
Copy link
Member

Sorry for the late reply. I think that @dlutz2 is right, and this is related to #1104, which points out a problem with the QuotedEntityChecker. But I'm not sure how to fix it.

As a workaround, @andreas-w-m can you define a small ontology that declares ex:consists_of as an owl:ObjectProperty and load it with --input as part of the call to robot template? Then ROBOT would know the correct type and not have to guess.

@jamesaoverton
Copy link
Member

I think I figured it out. It's not quite what I thought.

ROBOT handles templates in two passes: first it collects IDs, labels, and types, then adds them to the QuotedEntityChecker (QEC); second, it handles the row and creates OWL entities, which often means using parsing Manchester with the QEC. The problem here is with the first step, and specifically in Template.addLabels(). In this case the template does not include a LABEL column, and so the current code just skips that entity. But this is not correct: we know the ID and the type, so we should add that to the QEC.

This should also address PR #1104, so I pushed my fix there. I checked @andreas-w-m's example and got the expected owl:onClass restrictions. It would be great if someone else could check my fix.

@jamesaoverton
Copy link
Member

I added @andreas-w-m's example to the test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants