robot extract/merge should optionally inject provenance #977

hkir-dev · 2022-03-14T14:44:14Z

Resolves [#893 , resolves #893]

docs/ have been added/updated
tests have been added/updated
mvn verify says all tests pass
mvn site says all JavaDocs correct
CHANGELOG.md has been updated

in --inputs "edit*.owl" command #893

robot-core/src/main/java/org/obolibrary/robot/MergeOperation.java

docs/annotate.md

docs/examples/example2_derived_by.owl

matentzn

I think this looks good. @jamesaoverton will want to look over this when he gets a chance, and before merging I will try this out in ODK.

jamesaoverton · 2022-03-21T16:32:00Z

I'm struggling to find the time, but I will look at this.

jamesaoverton · 2022-04-05T14:43:51Z

@hkir-dev Thanks for the PR. I finally got a chance to take a first look. Overall it looks good. I have one request at this point: can you please make the example files smaller? It's great that you included the examples and tests, but they are just larger than they need to be to demonstrate the purpose and effect of the command.

When you make this change, I'll take another look, and then ask @beckyjackson to review.

Sorry again for the delay -- I'll be faster next time.

hkir-dev · 2022-04-06T10:30:12Z

@jamesaoverton I simplified the example files. I can happily implement further improvements if required.

jamesaoverton · 2022-04-06T13:47:02Z

@hkir-dev Much appreciated! @beckyjackson can you please review this?

beckyjackson

Looks good to me. Should the link in the CHANGELOG be to the issue number or the PR number? We've linked to the PR for everything in the past.

hkir-dev · 2022-04-07T07:25:26Z

@beckyjackson Updated CHANGELOG to point PR instead of the issue to be compatible with old ones.

jamesaoverton · 2022-06-09T17:10:29Z

docs/examples/example2_defined_by.owl

+    <!-- http://www.w3.org/2000/01/rdf-schema#comment -->
+
+    <rdf:Description rdf:about="http://www.w3.org/2000/01/rdf-schema#comment">
+        <rdfs:isDefinedBy rdf:resource="https://github.com/ontodev/robot/examples/edit.owl"/>


I apologize sincerely that I haven't reviewed this carefully, although it's been open for many months.

I don't understand this behaviour: It's not true these terms from RDFS, OIO, PROV, etc. isDefinedBy the edit.owl file. Is this intended? @hkir-dev @matentzn

Very well spotted.. We can probably leave this PR for the next ROBOT soon (much less urgent). So we have basically three options:

Exclude standard vocabs like skos, rdf, owl etc from this process. A cool alternative in the spirit of this PR would be to actually hard code the "isDefinedBy" relation in a map for these, so they are added in, but with the correct source.

We restrict this method to classes by default (bit meh, least favourite option)

We make the command respect --select option like remove, and only add provenance to the selected terms (in particular you can use the IRI or CURIE style selectors, but also do (2) if you want to by selecting classes.

I think 1 and 3 are both good. What do you think?

I was thinking 3, but 1 could also work.

Sorry for the late response. Based on Jim's comment

I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.

Originally posted by @balhoff in #977 (comment)

should we implement the option 3 or keep the current behavior as is?

I have just broken my head over what should be the right way.. I think the current behaviour is probably correct after all. If we add "rdfs:isDefinedBy" annotations on terms to OMO, these wont propagate further (i.e ontologies importing OMO will just see isDefinedBy OMO). It also gives us a simple way to determine which ontologies do not use a proper OMO import. I think if an ontology introduces a term that does not have any provenance, it is probably correct to say that it is "definedBy" that ontology, as their is no further information on whether the axioms and annotations around it were actually obtained from the place we believe it was defined. Also, a problem with 1 is that there is no canonical way to specify where some of these are defined, in particular if you think about skos vs OMO. It would be unfortunate to hard code these assumptions, and non OBO users being tripped up. I think I am inclined to think the current behaviour is correct.

@matentzn I think that sounds good, especially since existing definedBy annotations are carried through.

robot-core/src/main/java/org/obolibrary/robot/OntologyHelper.java

jamesaoverton

What is the intended behaviour for entities ~~and axioms~~ that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.

Please either drop overload parameter, or skip the check for existingAnnotation if overload is true.

balhoff

I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom in merge. I ran this test:

./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn

It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

balhoff · 2022-06-10T14:20:57Z

It would be nice if the annotation property used for axioms was configurable, but then again that option that could be added in a backwards-compatible way later.

balhoff · 2022-06-10T14:22:59Z

What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.

@jamesaoverton I don't think your concern applies to axioms. I want to have axioms annotated based on which file they are coming from, regardless of any namespaces involved.

jamesaoverton · 2022-06-10T14:31:01Z

@balhoff You're right. I updated my comment. The main purpose is to run this on a base file, which should not include external/upstream axioms.

But in even in a base file, OWLAPI will include "stubs" for upstream entities from RDF, RDFS, PROV, etc. and their rdf:types, because it OWL needs various type information. Just saying every entity in the source file is defined by that source file is misleading. Nico's proposal 3 would filter for namespaces, like we do when constructing the base file, so I think the result will be correct.

balhoff · 2022-06-10T14:36:34Z

I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.

matentzn · 2022-06-10T14:57:42Z

The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split. So:

Example:1

A sub B [source: Nico]
A sub B

becomes

A sub B [source: Nico]

and

Example:2

A sub B [source: Nico]
A sub B [source: James]

becomes:

A sub B [source: Nico, source: James]

balhoff · 2022-06-10T15:37:21Z

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split.

That would be fine, but it doesn't affect my opinion about how this command should work. For this case, if an axiom came from multiple ontologies, it would be clearer to see it multiple times, especially if the versions from the different ontologies had their own annotations coming from those ontologies.

matentzn · 2022-06-10T15:40:52Z

Ah didnt think of that case where the same axioms comes from two sources.. Maybe. Ok. Yeah, in any case I agree that this is needed.

hkir-dev · 2022-06-29T13:41:25Z

I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom in merge. I ran this test:
./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn
It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

@balhoff thank you for noticing this. I fixed the issue and added three unit tests to MergeOperationTest.java.

balhoff

Looks good!

jamesaoverton · 2022-09-30T15:16:07Z

Hey everybody. I lost track of this over the summer, between conferences and vacations. I apologize.

I just want to check one more time that this is really, really the behaviour you want, especially for the core vocabularies such as RDF, RDFS, PROV, etc. For example:

robot/robot-core/src/test/resources/simple_defined_by.owl

Lines 26 to 28 in e7a1097

    
           <rdf:Description rdf:about="http://www.w3.org/2000/01/rdf-schema#label"> 
        
               <rdfs:isDefinedBy rdf:resource="https://github.com/ontodev/robot/robot-core/src/test/resources/simple.owl"/> 
        
           </rdf:Description>

It says rdfs:label rdfs:isDefinedBy simple.owl. I can see how that would be useful for debugging certain cases, but that's not at all what I would expect from an "isDefinedBy" predicate. On the other hand, the RDFS spec is no help. It says rdfs:isDefinedBy rdfs:comment "The defininition of the subject resource." What the heck does that mean? I guess it makes more sense as a subproperty of rdfs:seeAlso which has rdfs:comment "Further information about the subject resource.".

balhoff · 2022-10-13T17:14:21Z

Personally I think provenance for this feature is about the axioms, not the entities. I would rather remove the entity annotation feature and keep the axiom annotation feature if that meant this PR could be merged and released sooner.

matentzn · 2022-10-24T08:20:41Z

It depends on the perspective - it is true that for the provenance feature, axioms are the critical part, but our other goal is to try and establish some level of "ownership" for an existing term, a place it belongs.. I think we can go ahead with this feature here as it is. We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

balhoff · 2022-10-24T14:54:58Z

We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

That's a good idea!

jamesaoverton · 2022-10-24T15:07:17Z

Just adding isDefinedBy annotations to OMO is an OBO-centric and pretty narrow "solution" to this. Since this is just an option, maybe that's OK. This is one of those decisions that I'm worried I'll regret. Let me think about it a bit more.

matentzn · 2022-10-25T10:44:19Z

We could introduce a flag here and leave it to the user. --annotate-built-in-namespaces which we set to true by default.

EDIT:
or if you don't like "built-in" you could provide a list: --ignore-base-iri http://skos.org/

matentzn · 2022-10-28T17:58:06Z

When it comes down to it, owl and rdf vocabulary are really the questionable ones to be annotated right? Or is there anything else?

jamesaoverton · 2022-10-28T19:15:21Z

The easiest problems to point to for this isDefinedBy behaviour are built-ins (RDF, RDFS, OWL), but it could easily be oboInOwl, FOAF, or anything. I have no interest in updating and maintaining an include-list of these. I'm clear on the problems this is creating, and much less clear on the problems that it's supposed to be solving.

I expect to be answering issues from confused users for years to come, but I'll merge anyway.

hkir-dev added 6 commits March 14, 2022 14:12

robot extract/merge should optionally inject provenance #893

35b5228

fix comment #893

0183532

fixing imports #893

1e00dbf

change log updated #893

b4e9a66

fix change log #893

6320f25

renamed edit3.owl to merge.owl since it was causing random failures due

7744bb5

in --inputs "edit*.owl" command #893

matentzn reviewed Mar 14, 2022

View reviewed changes

robot-core/src/main/java/org/obolibrary/robot/MergeOperation.java Show resolved Hide resolved

matentzn reviewed Mar 14, 2022

View reviewed changes

robot-core/src/main/java/org/obolibrary/robot/MergeOperation.java Show resolved Hide resolved

docs/annotate.md Outdated Show resolved Hide resolved

docs/examples/example2_derived_by.owl Outdated Show resolved Hide resolved

PR #977 review fixes

a4fed9c

hkir-dev requested a review from matentzn March 15, 2022 14:13

matentzn approved these changes Mar 15, 2022

View reviewed changes

jamesaoverton self-requested a review March 21, 2022 16:31

hkir-dev and others added 4 commits April 6, 2022 10:42

#977 example files simplified

f77493a

#977 resolve conflict

6db54a2

Merge branch 'master' into 893-fix

5d23ba0

#977 example2.owl file simplified

4926306

jamesaoverton requested a review from beckyjackson April 6, 2022 13:47

beckyjackson approved these changes Apr 6, 2022

View reviewed changes

point changelog to the PR

4a45cea

jamesaoverton added this to the 1.8.4 milestone Jun 6, 2022

Merge branch 'master' into 893-fix

59833e0

jamesaoverton reviewed Jun 9, 2022

View reviewed changes

Fix missing linebreak in new examples/tests

3ca3294

jamesaoverton reviewed Jun 9, 2022

View reviewed changes

robot-core/src/main/java/org/obolibrary/robot/OntologyHelper.java Show resolved Hide resolved

jamesaoverton requested changes Jun 9, 2022

View reviewed changes

balhoff requested changes Jun 10, 2022

View reviewed changes

jamesaoverton mentioned this pull request Jun 10, 2022

Implementation of expand command. #964

Merged

5 tasks

matentzn assigned hkir-dev Jun 20, 2022

hkir-dev added 3 commits June 29, 2022 14:10

PR #977 removed duplicate axioms

34155be

PR #977 renamed unit test

dd71d54

PR #977 improve merge import unit test

9969376

Merge branch 'master' into 893-fix

e7a1097

hkir-dev requested a review from balhoff June 29, 2022 13:54

balhoff approved these changes Jul 1, 2022

View reviewed changes

hkir-dev requested a review from jamesaoverton July 4, 2022 07:32

jamesaoverton added this to the After 1.9.0 milestone Jul 5, 2022

jamesaoverton approved these changes Sep 30, 2022

View reviewed changes

jamesaoverton merged commit aa9869b into ontodev:master Oct 28, 2022

jamesaoverton mentioned this pull request Nov 29, 2023

MERGE annotates xsd datatypes #1163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robot extract/merge should optionally inject provenance #977

robot extract/merge should optionally inject provenance #977

hkir-dev commented Mar 14, 2022 •

edited

Loading

matentzn left a comment

jamesaoverton commented Mar 21, 2022

jamesaoverton commented Apr 5, 2022

hkir-dev commented Apr 6, 2022

jamesaoverton commented Apr 6, 2022

beckyjackson left a comment

hkir-dev commented Apr 7, 2022

jamesaoverton Jun 9, 2022 •

edited

Loading

matentzn Jun 10, 2022

jamesaoverton Jun 10, 2022

hkir-dev Jun 29, 2022

matentzn Jun 30, 2022

balhoff Jul 1, 2022

jamesaoverton left a comment •

edited

Loading

balhoff left a comment

balhoff commented Jun 10, 2022

balhoff commented Jun 10, 2022 •

edited by matentzn

Loading

jamesaoverton commented Jun 10, 2022

balhoff commented Jun 10, 2022 •

edited

Loading

matentzn commented Jun 10, 2022

balhoff commented Jun 10, 2022

matentzn commented Jun 10, 2022

hkir-dev commented Jun 29, 2022

balhoff left a comment

jamesaoverton commented Sep 30, 2022

balhoff commented Oct 13, 2022

matentzn commented Oct 24, 2022

balhoff commented Oct 24, 2022

jamesaoverton commented Oct 24, 2022

matentzn commented Oct 25, 2022 •

edited

Loading

matentzn commented Oct 28, 2022

jamesaoverton commented Oct 28, 2022

robot extract/merge should optionally inject provenance #977

robot extract/merge should optionally inject provenance #977

Conversation

hkir-dev commented Mar 14, 2022 • edited Loading

matentzn left a comment

Choose a reason for hiding this comment

jamesaoverton commented Mar 21, 2022

jamesaoverton commented Apr 5, 2022

hkir-dev commented Apr 6, 2022

jamesaoverton commented Apr 6, 2022

beckyjackson left a comment

Choose a reason for hiding this comment

hkir-dev commented Apr 7, 2022

jamesaoverton Jun 9, 2022 • edited Loading

Choose a reason for hiding this comment

matentzn Jun 10, 2022

Choose a reason for hiding this comment

jamesaoverton Jun 10, 2022

Choose a reason for hiding this comment

hkir-dev Jun 29, 2022

Choose a reason for hiding this comment

matentzn Jun 30, 2022

Choose a reason for hiding this comment

balhoff Jul 1, 2022

Choose a reason for hiding this comment

jamesaoverton left a comment • edited Loading

Choose a reason for hiding this comment

balhoff left a comment

Choose a reason for hiding this comment

balhoff commented Jun 10, 2022

balhoff commented Jun 10, 2022 • edited by matentzn Loading

jamesaoverton commented Jun 10, 2022

balhoff commented Jun 10, 2022 • edited Loading

matentzn commented Jun 10, 2022

balhoff commented Jun 10, 2022

matentzn commented Jun 10, 2022

hkir-dev commented Jun 29, 2022

balhoff left a comment

Choose a reason for hiding this comment

jamesaoverton commented Sep 30, 2022

balhoff commented Oct 13, 2022

matentzn commented Oct 24, 2022

balhoff commented Oct 24, 2022

jamesaoverton commented Oct 24, 2022

matentzn commented Oct 25, 2022 • edited Loading

matentzn commented Oct 28, 2022

jamesaoverton commented Oct 28, 2022

hkir-dev commented Mar 14, 2022 •

edited

Loading

jamesaoverton Jun 9, 2022 •

edited

Loading

jamesaoverton left a comment •

edited

Loading

balhoff commented Jun 10, 2022 •

edited by matentzn

Loading

balhoff commented Jun 10, 2022 •

edited

Loading

matentzn commented Oct 25, 2022 •

edited

Loading