Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robot extract/merge should optionally inject provenance #977

Merged
merged 20 commits into from
Oct 28, 2022
Merged

robot extract/merge should optionally inject provenance #977

merged 20 commits into from
Oct 28, 2022

Conversation

hkir-dev
Copy link
Contributor

@hkir-dev hkir-dev commented Mar 14, 2022

Resolves [#893 , resolves #893]

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

docs/annotate.md Outdated Show resolved Hide resolved
docs/examples/example2_derived_by.owl Outdated Show resolved Hide resolved
@hkir-dev hkir-dev requested a review from matentzn March 15, 2022 14:13
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. @jamesaoverton will want to look over this when he gets a chance, and before merging I will try this out in ODK.

@jamesaoverton jamesaoverton self-requested a review March 21, 2022 16:31
@jamesaoverton
Copy link
Member

I'm struggling to find the time, but I will look at this.

@jamesaoverton
Copy link
Member

@hkir-dev Thanks for the PR. I finally got a chance to take a first look. Overall it looks good. I have one request at this point: can you please make the example files smaller? It's great that you included the examples and tests, but they are just larger than they need to be to demonstrate the purpose and effect of the command.

When you make this change, I'll take another look, and then ask @beckyjackson to review.

Sorry again for the delay -- I'll be faster next time.

@hkir-dev
Copy link
Contributor Author

hkir-dev commented Apr 6, 2022

@jamesaoverton I simplified the example files. I can happily implement further improvements if required.

@jamesaoverton
Copy link
Member

@hkir-dev Much appreciated! @beckyjackson can you please review this?

Copy link
Contributor

@beckyjackson beckyjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Should the link in the CHANGELOG be to the issue number or the PR number? We've linked to the PR for everything in the past.

@hkir-dev
Copy link
Contributor Author

hkir-dev commented Apr 7, 2022

@beckyjackson Updated CHANGELOG to point PR instead of the issue to be compatible with old ones.

@jamesaoverton jamesaoverton added this to the 1.8.4 milestone Jun 6, 2022
<!-- http://www.w3.org/2000/01/rdf-schema#comment -->

<rdf:Description rdf:about="http://www.w3.org/2000/01/rdf-schema#comment">
<rdfs:isDefinedBy rdf:resource="https://github.com/ontodev/robot/examples/edit.owl"/>
Copy link
Member

@jamesaoverton jamesaoverton Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize sincerely that I haven't reviewed this carefully, although it's been open for many months.

I don't understand this behaviour: It's not true these terms from RDFS, OIO, PROV, etc. isDefinedBy the edit.owl file. Is this intended? @hkir-dev @matentzn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well spotted.. We can probably leave this PR for the next ROBOT soon (much less urgent). So we have basically three options:

  1. Exclude standard vocabs like skos, rdf, owl etc from this process. A cool alternative in the spirit of this PR would be to actually hard code the "isDefinedBy" relation in a map for these, so they are added in, but with the correct source.
  2. We restrict this method to classes by default (bit meh, least favourite option)
  3. We make the command respect --select option like remove, and only add provenance to the selected terms (in particular you can use the IRI or CURIE style selectors, but also do (2) if you want to by selecting classes.

I think 1 and 3 are both good. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking 3, but 1 could also work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. Based on Jim's comment

I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.

Originally posted by @balhoff in #977 (comment)

should we implement the option 3 or keep the current behavior as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just broken my head over what should be the right way.. I think the current behaviour is probably correct after all. If we add "rdfs:isDefinedBy" annotations on terms to OMO, these wont propagate further (i.e ontologies importing OMO will just see isDefinedBy OMO). It also gives us a simple way to determine which ontologies do not use a proper OMO import. I think if an ontology introduces a term that does not have any provenance, it is probably correct to say that it is "definedBy" that ontology, as their is no further information on whether the axioms and annotations around it were actually obtained from the place we believe it was defined. Also, a problem with 1 is that there is no canonical way to specify where some of these are defined, in particular if you think about skos vs OMO. It would be unfortunate to hard code these assumptions, and non OBO users being tripped up. I think I am inclined to think the current behaviour is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn I think that sounds good, especially since existing definedBy annotations are carried through.

Copy link
Member

@jamesaoverton jamesaoverton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.

Please either drop overload parameter, or skip the check for existingAnnotation if overload is true.

Copy link
Contributor

@balhoff balhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom in merge. I ran this test:

./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn

It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

@balhoff
Copy link
Contributor

balhoff commented Jun 10, 2022

It would be nice if the annotation property used for axioms was configurable, but then again that option that could be added in a backwards-compatible way later.

@balhoff
Copy link
Contributor

balhoff commented Jun 10, 2022

What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.

@jamesaoverton I don't think your concern applies to axioms. I want to have axioms annotated based on which file they are coming from, regardless of any namespaces involved.

@jamesaoverton
Copy link
Member

@balhoff You're right. I updated my comment. The main purpose is to run this on a base file, which should not include external/upstream axioms.

But in even in a base file, OWLAPI will include "stubs" for upstream entities from RDF, RDFS, PROV, etc. and their rdf:types, because it OWL needs various type information. Just saying every entity in the source file is defined by that source file is misleading. Nico's proposal 3 would filter for namespaces, like we do when constructing the base file, so I think the result will be correct.

@balhoff
Copy link
Contributor

balhoff commented Jun 10, 2022

I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.

@matentzn
Copy link
Contributor

The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split. So:

Example:1

A sub B [source: Nico]
A sub B

becomes

A sub B [source: Nico]

and

Example:2

A sub B [source: Nico]
A sub B [source: James]

becomes:

A sub B [source: Nico, source: James]

@balhoff
Copy link
Contributor

balhoff commented Jun 10, 2022

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split.

That would be fine, but it doesn't affect my opinion about how this command should work. For this case, if an axiom came from multiple ontologies, it would be clearer to see it multiple times, especially if the versions from the different ontologies had their own annotations coming from those ontologies.

@matentzn
Copy link
Contributor

Ah didnt think of that case where the same axioms comes from two sources.. Maybe. Ok. Yeah, in any case I agree that this is needed.

@hkir-dev
Copy link
Contributor Author

I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom in merge. I ran this test:

./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn

It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

@balhoff thank you for noticing this. I fixed the issue and added three unit tests to MergeOperationTest.java.

@hkir-dev hkir-dev requested a review from balhoff June 29, 2022 13:54
Copy link
Contributor

@balhoff balhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@hkir-dev hkir-dev requested a review from jamesaoverton July 4, 2022 07:32
@jamesaoverton jamesaoverton added this to the After 1.9.0 milestone Jul 5, 2022
@jamesaoverton
Copy link
Member

Hey everybody. I lost track of this over the summer, between conferences and vacations. I apologize.

I just want to check one more time that this is really, really the behaviour you want, especially for the core vocabularies such as RDF, RDFS, PROV, etc. For example:

<rdf:Description rdf:about="http://www.w3.org/2000/01/rdf-schema#label">
<rdfs:isDefinedBy rdf:resource="https://github.com/ontodev/robot/robot-core/src/test/resources/simple.owl"/>
</rdf:Description>

It says rdfs:label rdfs:isDefinedBy simple.owl. I can see how that would be useful for debugging certain cases, but that's not at all what I would expect from an "isDefinedBy" predicate. On the other hand, the RDFS spec is no help. It says rdfs:isDefinedBy rdfs:comment "The defininition of the subject resource." What the heck does that mean? I guess it makes more sense as a subproperty of rdfs:seeAlso which has rdfs:comment "Further information about the subject resource.".

@balhoff
Copy link
Contributor

balhoff commented Oct 13, 2022

Personally I think provenance for this feature is about the axioms, not the entities. I would rather remove the entity annotation feature and keep the axiom annotation feature if that meant this PR could be merged and released sooner.

@matentzn
Copy link
Contributor

It depends on the perspective - it is true that for the provenance feature, axioms are the critical part, but our other goal is to try and establish some level of "ownership" for an existing term, a place it belongs.. I think we can go ahead with this feature here as it is. We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

@balhoff
Copy link
Contributor

balhoff commented Oct 24, 2022

We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

That's a good idea!

@jamesaoverton
Copy link
Member

Just adding isDefinedBy annotations to OMO is an OBO-centric and pretty narrow "solution" to this. Since this is just an option, maybe that's OK. This is one of those decisions that I'm worried I'll regret. Let me think about it a bit more.

@matentzn
Copy link
Contributor

matentzn commented Oct 25, 2022

We could introduce a flag here and leave it to the user. --annotate-built-in-namespaces which we set to true by default.

EDIT:
or if you don't like "built-in" you could provide a list: --ignore-base-iri http://skos.org/

@matentzn
Copy link
Contributor

When it comes down to it, owl and rdf vocabulary are really the questionable ones to be annotated right? Or is there anything else?

@jamesaoverton
Copy link
Member

The easiest problems to point to for this isDefinedBy behaviour are built-ins (RDF, RDFS, OWL), but it could easily be oboInOwl, FOAF, or anything. I have no interest in updating and maintaining an include-list of these. I'm clear on the problems this is creating, and much less clear on the problems that it's supposed to be solving.

I expect to be answering issues from confused users for years to come, but I'll merge anyway.

@jamesaoverton jamesaoverton merged commit aa9869b into ontodev:master Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

robot extract/merge should optionally inject provenance
5 participants