-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robot extract/merge should optionally inject provenance #977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. @jamesaoverton will want to look over this when he gets a chance, and before merging I will try this out in ODK.
I'm struggling to find the time, but I will look at this. |
@hkir-dev Thanks for the PR. I finally got a chance to take a first look. Overall it looks good. I have one request at this point: can you please make the example files smaller? It's great that you included the examples and tests, but they are just larger than they need to be to demonstrate the purpose and effect of the command. When you make this change, I'll take another look, and then ask @beckyjackson to review. Sorry again for the delay -- I'll be faster next time. |
@jamesaoverton I simplified the example files. I can happily implement further improvements if required. |
@hkir-dev Much appreciated! @beckyjackson can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Should the link in the CHANGELOG be to the issue number or the PR number? We've linked to the PR for everything in the past.
@beckyjackson Updated CHANGELOG to point PR instead of the issue to be compatible with old ones. |
<!-- http://www.w3.org/2000/01/rdf-schema#comment --> | ||
|
||
<rdf:Description rdf:about="http://www.w3.org/2000/01/rdf-schema#comment"> | ||
<rdfs:isDefinedBy rdf:resource="https://github.com/ontodev/robot/examples/edit.owl"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well spotted.. We can probably leave this PR for the next ROBOT soon (much less urgent). So we have basically three options:
- Exclude standard vocabs like skos, rdf, owl etc from this process. A cool alternative in the spirit of this PR would be to actually hard code the "isDefinedBy" relation in a map for these, so they are added in, but with the correct source.
- We restrict this method to classes by default (bit meh, least favourite option)
- We make the command respect
--select
option likeremove
, and only add provenance to the selected terms (in particular you can use the IRI or CURIE style selectors, but also do (2) if you want to by selecting classes.
I think 1 and 3 are both good. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking 3, but 1 could also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. Based on Jim's comment
I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says
rdfs:comment rdf:type owl:AnnotationProperty
, I want that axiom annotated withprov:wasDerivedFrom <myont-base.owl>
in the output, so I know which of the input files made that declaration.
Originally posted by @balhoff in #977 (comment)
should we implement the option 3 or keep the current behavior as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just broken my head over what should be the right way.. I think the current behaviour is probably correct after all. If we add "rdfs:isDefinedBy" annotations on terms to OMO, these wont propagate further (i.e ontologies importing OMO will just see isDefinedBy OMO). It also gives us a simple way to determine which ontologies do not use a proper OMO import. I think if an ontology introduces a term that does not have any provenance, it is probably correct to say that it is "definedBy" that ontology, as their is no further information on whether the axioms and annotations around it were actually obtained from the place we believe it was defined. Also, a problem with 1 is that there is no canonical way to specify where some of these are defined, in particular if you think about skos vs OMO. It would be unfortunate to hard code these assumptions, and non OBO users being tripped up. I think I am inclined to think the current behaviour is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn I think that sounds good, especially since existing definedBy annotations are carried through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.
Please either drop overload
parameter, or skip the check for existingAnnotation
if overload
is true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom
in merge
. I ran this test:
./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn
It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.
It would be nice if the annotation property used for axioms was configurable, but then again that option that could be added in a backwards-compatible way later. |
@jamesaoverton I don't think your concern applies to axioms. I want to have axioms annotated based on which file they are coming from, regardless of any namespaces involved. |
@balhoff You're right. I updated my comment. The main purpose is to run this on a base file, which should not include external/upstream axioms. But in even in a base file, OWLAPI will include "stubs" for upstream entities from RDF, RDFS, PROV, etc. and their |
I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says |
I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split. So: Example:1
becomes
and Example:2
becomes:
|
That would be fine, but it doesn't affect my opinion about how this command should work. For this case, if an axiom came from multiple ontologies, it would be clearer to see it multiple times, especially if the versions from the different ontologies had their own annotations coming from those ontologies. |
Ah didnt think of that case where the same axioms comes from two sources.. Maybe. Ok. Yeah, in any case I agree that this is needed. |
@balhoff thank you for noticing this. I fixed the issue and added three unit tests to MergeOperationTest.java. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Hey everybody. I lost track of this over the summer, between conferences and vacations. I apologize. I just want to check one more time that this is really, really the behaviour you want, especially for the core vocabularies such as RDF, RDFS, PROV, etc. For example: robot/robot-core/src/test/resources/simple_defined_by.owl Lines 26 to 28 in e7a1097
It says |
Personally I think provenance for this feature is about the axioms, not the entities. I would rather remove the entity annotation feature and keep the axiom annotation feature if that meant this PR could be merged and released sooner. |
It depends on the perspective - it is true that for the provenance feature, axioms are the critical part, but our other goal is to try and establish some level of "ownership" for an existing term, a place it belongs.. I think we can go ahead with this feature here as it is. We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again, |
That's a good idea! |
Just adding |
We could introduce a flag here and leave it to the user. EDIT: |
When it comes down to it, owl and rdf vocabulary are really the questionable ones to be annotated right? Or is there anything else? |
The easiest problems to point to for this I expect to be answering issues from confused users for years to come, but I'll merge anyway. |
Resolves [#893 , resolves #893]
docs/
have been added/updatedmvn verify
says all tests passmvn site
says all JavaDocs correctCHANGELOG.md
has been updated