-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robot outputs invalid turtle syntax #1129
Comments
The input RDF/XML has the same weird problems with blank node IDs: <!-- http://purl.obolibrary.org/obo/MONDO_0000290 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0000290">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/MONDO_0005550"/>
<owl:Restriction rdf:nodeID="genid24282">
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0014001"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_5763"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0002428"/>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0020067"/>
<rdfs:subClassOf rdf:nodeID="genid24282"/>
<obo:IAO_0000115>A infectious disease involving the Naegleria fowleri.</obo:IAO_0000115>
<terms:conformsTo rdf:resource="http://purl.obolibrary.org/obo/mondo/patterns/infectious_disease_by_agent.yaml"/>
<oboInOwl:hasDbXref>DOID:0050242</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>GARD:0009554</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>MESH:C535275</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>SCTID:721816008</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>UMLS:C0300934</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>UMLS:C4303098</oboInOwl:hasDbXref>
<oboInOwl:hasExactSynonym>Naegleria fowleri infection</oboInOwl:hasExactSynonym>
<oboInOwl:hasRelatedSynonym>infections, Naegleria fowleri</oboInOwl:hasRelatedSynonym>
<oboInOwl:id>MONDO:0000290</oboInOwl:id>
<oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/mondo#mondo_rare"/>
<oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/mondo#rare"/>
<rdfs:label>primary amebic meningoencephalitis</rdfs:label>
<skos:exactMatch rdf:resource="http://identifiers.org/mesh/C535275"/>
<skos:exactMatch rdf:resource="http://identifiers.org/snomedct/721816008"/>
<skos:exactMatch rdf:resource="http://linkedlifedata.com/resource/umls/id/C0300934"/>
<skos:exactMatch rdf:resource="http://linkedlifedata.com/resource/umls/id/C4303098"/>
<skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/DOID_0050242"/>
</owl:Class>
<owl:Restriction rdf:nodeID="genid24282">
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0014001"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_5763"/>
</owl:Restriction> |
Comparing a section from <!-- http://purl.obolibrary.org/obo/GO_0000019 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0000019">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/GO_0065007"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002211"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0006312"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0000018"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002211"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0006312"/>
</owl:Restriction>
</rdfs:subClassOf>
<obo1:IAO_0000115>Any process that modulates the frequency, rate or extent of DNA recombination during mitosis.</obo1:IAO_0000115>
<oboInOwl:hasNarrowSynonym>regulation of recombination within rDNA repeats</oboInOwl:hasNarrowSynonym>
<oboInOwl:hasOBONamespace>biological_process</oboInOwl:hasOBONamespace>
<oboInOwl:id>GO:0000019</oboInOwl:id>
<rdfs:label>regulation of mitotic recombination</rdfs:label>
</owl:Class> |
I dont understand this ticket well, but some observations:
and should be:
(This part needed to go)
Its not about the genids.. these are fine. The turtle serialise but this nonsense blank node expansion in the expression.. Sounds like OWLAPI bug. |
This is still fine imo (@balhoff I don't see why this is wrong):
But adding this:
Makes the serialser break. |
I though I could at least fix it in Mondo by removing the axiom annotations but there are too many of these.. |
@matentzn your first example is not valid, since the equivalent class axiom and the subclass axiom should not share a blank node representing the existential restriction. The spec says:
When there is an annotated axiom, in RDF serializations the core axiom is represented by triples, as well as a reified version to which the annotations are attached. The OWL API changed at some point in how it handles anonymous expressions in the core axiom and in the reified version. In this input both the equivalent class axiom and the subclass axiom contain the expression Ontology(<http://example.org/>
Declaration(Class(<http://example.org/A>))
Declaration(Class(<http://example.org/B>))
Declaration(Class(<http://example.org/C>))
Declaration(ObjectProperty(<http://example.org/r>))
EquivalentClasses(<http://example.org/A> ObjectIntersectionOf(<http://example.org/C> ObjectSomeValuesFrom(<http://example.org/r> <http://example.org/B>)))
SubClassOf(Annotation(rdfs:comment "This axiom is annotated.") <http://example.org/A> ObjectSomeValuesFrom(<http://example.org/r> <http://example.org/B>))
) An older version of ROBOT (I tested 1.8.1) translates to this Turtle ( @prefix : <http://example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example.org/> rdf:type owl:Ontology .
:r rdf:type owl:ObjectProperty .
:A rdf:type owl:Class ;
owl:equivalentClass [ owl:intersectionOf ( :C
[ rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
]
) ;
rdf:type owl:Class
] ;
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
] .
[ rdf:type owl:Axiom ;
owl:annotatedSource :A ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedTarget [ rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
] ;
rdfs:comment "This axiom is annotated."
] .
:B rdf:type owl:Class .
:C rdf:type owl:Class . A fresh blank node is used every time [ rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
] But in ROBOT 1.9.4, the same conversion produces this Turtle: @prefix : <http://example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example.org/> rdf:type owl:Ontology .
:r rdf:type owl:ObjectProperty .
:A rdf:type owl:Class ;
owl:equivalentClass [ owl:intersectionOf ( :C
[ rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
]
) ;
rdf:type owl:Class
] ;
rdfs:subClassOf _:genid5 .
_:genid5 rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B .
[ rdf:type owl:Axiom ;
owl:annotatedSource :A ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedTarget _:genid5 ;
rdfs:comment "This axiom is annotated."
] .
:B rdf:type owl:Class .
:C rdf:type owl:Class . The blank node used for the expression in the subclass axiom is given an identifier so that it can be referenced in the reified annotated axiom. There was a long discussion of this in owlcs/owlapi#874. I agree with @ignazio1977 that the spec is really unclear about how to handle this, but in my interpretation I would prefer the previous approach. The problem in this ticket is happening somehow within <?xml version="1.0"?>
<rdf:RDF xmlns="http://example.org/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology rdf:about="http://example.org/"/>
<owl:ObjectProperty rdf:about="http://example.org/r"/>
<owl:Class rdf:about="http://example.org/A">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://example.org/C"/>
<owl:Restriction rdf:nodeID="genid3">
<owl:onProperty rdf:resource="http://example.org/r"/>
<owl:someValuesFrom rdf:resource="http://example.org/B"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://example.org/C"/>
<rdfs:subClassOf rdf:nodeID="genid3"/>
</owl:Class>
<owl:Restriction rdf:nodeID="genid3">
<owl:onProperty rdf:resource="http://example.org/r"/>
<owl:someValuesFrom rdf:resource="http://example.org/B"/>
</owl:Restriction>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://example.org/A"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:nodeID="genid3"/>
<rdfs:comment>This axiom is annotated.</rdfs:comment>
</owl:Axiom>
<owl:Class rdf:about="http://example.org/B"/>
<owl:Class rdf:about="http://example.org/C"/>
</rdf:RDF> The RDF/XML serializer outputs valid XML, but the Turtle serializer gets confused and outputs invalid Turtle ( @prefix : <http://example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example.org/> rdf:type owl:Ontology .
:r rdf:type owl:ObjectProperty .
:A rdf:type owl:Class ;
owl:equivalentClass [ owl:intersectionOf ( :C
_:genid3 rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B
) ;
rdf:type owl:Class
] ;
rdfs:subClassOf :C ,
_:genid3 .
_:genid3 rdf:type owl:Restriction ;
owl:onProperty :r ;
owl:someValuesFrom :B .
[ rdf:type owl:Axiom ;
owl:annotatedSource :A ;
owl:annotatedProperty rdfs:subClassOf ;
owl:annotatedTarget _:genid3 ;
rdfs:comment "This axiom is annotated."
] .
:B rdf:type owl:Class .
:C rdf:type owl:Class . So I think there are at least two bugs:
I also personally think OWL API should revert to the previous approach using all fresh blank nodes, but I'm not 100% sure about this. |
This is not great.. The main reason for moving to the last OWLAPI was serialisation stability.. How blanknode ids are named was a significant part of serialisation stability if I remember correctly.. |
I'll need to wrap my head around what's happening here. Fresh blank node id or reuse of previous id is something that can be controlled now (option was introduced a while ago to reuse existing id), but it's been a long time since I looked at that. "Blank nodes, or, why is OWL not used everywhere yet?" |
Thank you @ignazio1977 really appreciated.. This issue is extremely important for us, and I feel a lot less scared now that you joined the debugging party! |
I can confirm this doesn't have anything particular to do with //> using scala 2.13
//> using dep "net.sourceforge.owlapi:owlapi-distribution:4.5.25"
import org.semanticweb.owlapi.model._
import org.semanticweb.owlapi.apibinding.OWLManager
import org.semanticweb.owlapi.formats.TurtleDocumentFormat
import org.semanticweb.owlapi.formats.RioTurtleDocumentFormat
import scala.jdk.CollectionConverters._
import java.io.File
val factory = OWLManager.getOWLDataFactory()
val manager = OWLManager.createOWLOntologyManager()
val r = factory.getOWLObjectProperty(IRI.create("http://example.org/r"))
val A = factory.getOWLClass(IRI.create("http://example.org/A"))
val B = factory.getOWLClass(IRI.create("http://example.org/B"))
val C = factory.getOWLClass(IRI.create("http://example.org/C"))
val restriction = factory.getOWLObjectSomeValuesFrom(r, B)
val equiv = factory.getOWLEquivalentClassesAxiom(A, factory.getOWLObjectIntersectionOf(C, restriction))
val subClassOf = factory.getOWLSubClassOfAxiom(
A, restriction,
Set(factory.getOWLAnnotation(factory.getRDFSComment(), factory.getOWLLiteral("comment"))).asJava)
val ontology = manager.createOntology(Set[OWLAxiom](equiv, subClassOf).asJava)
manager.saveOntology(ontology, new TurtleDocumentFormat(), IRI.create(new File("test.ttl")))
manager.saveOntology(ontology, new RioTurtleDocumentFormat(), IRI.create(new File("test-rio.ttl"))) The RIO document format produces valid turtle, the other does not. So I need to open issues at OWL API. |
This rings a bell. There was a bug to do with exactly that: sharing of nodes is fine in memory but not in text syntaxes, where anonymous classes can't be reused (I say can't, probably 'shouldn't' is a better word) and the objects need to be duplicated. One symptom of this issue is ending up with spare RDF triples. Annotations on axioms present the same problem when objects are reused. I'm sure there was a fix for this - possibly two fixes at different times. But I'm not sure if the two were sorted and tested together. Thanks for the code, saves me a job. |
Sticking the code in an OWLAPI test, turns out this fails only on turtle syntax, the other syntaxes are fine. |
@ignazio1977 thanks for looking at it! Yes, the turtle writer is the only one that fails to output a valid file. But I think it is also incorrect that the equivalence axiom and the subclass axiom share blank nodes for the class expression (in all the RDF syntaxes), even if it is syntactically valid RDF. |
@ignazio1977 you said:
How can this be controlled? Is there any documentation for this? |
Normally they wouldn't, but annotations make everything worse. |
So, the shared node is referenced in three places in the output - two axioms, one of them with annotations. The annotation triggers reification, which is where the third reference appears. The ontology looks like this:
The node triples are outputted twice - same in turtle as in rdf/xml. It shouldn't make a difference to the parser. In fact, the rdf/xml parser copes with it. However, the Turtle parser doesn't like it - it doesn't expect an inline description with an id. Hard to change, as it's a JavaCC generated parser. However, the solution needs to be that the extra triples aren't outputted. The fact that they're outputted twice, and not three times, suggests that the mechanism for deciding if the id gets generated and the one for deciding if the triples are outputted are getting tangled. |
There are two options in ConfigurationOptions that control settings related to blank node ids (changing the values is described in the javadoc for this class).
I thin REMAP_IDS might be of use for Robot in some circumstances. If it's set to true, blank node ids when parsing will be the same value as they were when the ontology was saved (if they were written out, of course). So, if an ontology is saved with all ids written out, and parsed without remapping the ids, you bet all blank nodes with the same ids they had in their previous life; if only some nodes had their ids written out, those nodes will have the same id they originally had. This can have side effects if there happens to be a clash between blank nodes in the ontology and blank nodes in imported ontologies. That's (one of) the reason for remapping on parse. |
Problem was not the desharing of nodes; rather, when a node is referred in multiple places but should be output only once, the renderer 'defers' it. The node in question, So, we had a test covering this already, but it didn't cover shared nodes as part of lists. Same issue in RDF/XML, however the same fix doesn't seem to work. The two renderers are almost structurally identical. Almost. |
Fix pushed to version4 branch, let me know if you can test on your side as is or if I should make a release candidate for 4.5.26 |
4.5.26 released |
Thanks @ignazio1977! I was a bit too slow—I'll give that version a try. |
OWL API 4.5.26 is outputting valid Turtle for me; thanks @ignazio1977. So the fix for this ROBOT issue would be to upgrade our dependency to 4.5.26. |
This may be a problem in OWL API, but just tracking here for now. Run these commands on the latest MONDO release:
The problem section of the file looks like this:
In the intersection list you would expect the existential restriction to be enclosed in
[ ]
, but instead there is a blank node ID_:genid24281
followed by invalid inline properties. The blank node ID is reused twice below, which it should not be (OWL is supposed to use fresh blank nodes for reference to a class expression). That's fine in turtle, but might suggest the source of the error.Based on the type of expression, it's possible this is somehow related to
robot relax
.The text was updated successfully, but these errors were encountered: