Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robot extract/merge should optionally inject provenance #893

Closed
dosumis opened this issue Jul 27, 2021 · 19 comments · Fixed by #977
Closed

robot extract/merge should optionally inject provenance #893

dosumis opened this issue Jul 27, 2021 · 19 comments · Fixed by #977
Assignees

Comments

@dosumis
Copy link

dosumis commented Jul 27, 2021

robot extract --annotate-with-source currently annotates extracted terms with the IRI of the source ontology. It would be more useful if the version-iri was used for this (or this could be supported by a separate method).

@matentzn - if we incorporated this into ODK, it would solve the problem of not being able to trace versions for imports.

CC @hkir-dev could potentially implement if there is wider interest

@jamesaoverton
Copy link
Member

This would be a great addition, and probably easy to add. Do you have someone who can implement it?

The current implementation was designed to mimic Ontofox, which uses the ontology IRI.

@matentzn
Copy link
Contributor

I can implement this under P1 money, essential infrastructure and won't take long.

@matentzn
Copy link
Contributor

I know @jamesaoverton you are a quite conservative to maintain backwards compatibility, but if we do:

  • if version iri exists, use it, else use ontology iri (if exists)
  • document the change carefully

Would that be ok? To avoid having to implement yet another CLI switch?

@jamesaoverton
Copy link
Member

No, it has to be 100% backwards compatible.

@matentzn
Copy link
Contributor

Ok, so we could introduce a new flag:

--annotate-with-source-version-iri

to add another rdfs:isDefinedBy relation with the source IRI

(this could add two rdfs:isDefinedBy relations if both flags are used)

Would this address your issue @dosumis - should we use the same property for both (iri and purl based) annotation? Is term level the right granularity, or do we need this on ontology (module) level as well?

@hkir-dev
Copy link
Contributor

Yes, this will solve our issue and term level is OK for us.

Also, if you don't have the resources for this development, I would gladly do the implementation.

@matentzn
Copy link
Contributor

Alright @hkir-dev , lets schedule a co-coding session so I introduce you to ROBOT development and you can give it a shot - I will help you with the details along the way.

@dosumis
Copy link
Author

dosumis commented Jul 28, 2021

Ok, so we could introduce a new flag:

--annotate-with-source-version-iri
to add another rdfs:isDefinedBy relation with the source IRI

(this could add two rdfs:isDefinedBy relations if both flags are used)

I think that's OK. If both were used, it would be a bit hard to distinguish, but that can be managed for individual pipelines as needed.

Would this address your issue @dosumis - should we use the same property for both (iri and purl based) annotation? Is term level the right granularity, or do we need this on ontology (module) level as well?

Both could be useful, but my main interest is in getting this at the term level.

@hkir-dev
Copy link
Contributor

hkir-dev commented Sep 1, 2021

Using the rdfs:isDefinedBy annotation for both relationships may cause some complications. In the 'annotate-with-source' documentation, it is said that:

If the term already has an annotation using this property (rdfs:isDefinedBy), the existing annotation will be copied and no new annotation will be added.

But now we won't be able to distinguish whether it is a source or a version annotation. Changing this limitation may cause some behavior changes compared to previous versions.

Can we use the owl:versionInfo annotation for annotate-with-source-version-iri? The documentation says it can be used with any owl construct. The downside is the range is string. Or should we continue with rdfs:isDefinedBy?

@matentzn
Copy link
Contributor

matentzn commented Sep 1, 2021

Figuring this out @hkir-dev is very important. We need to tag axioms (annotations, logical etc) with version iris soon for ODK. owl:versionInfo would be an option, but I am not sure. I am also not sure of having two separate isDefinedBy annotations as this will cause some serious bloat..

But lets keep your proposal in mind, like using isDefinedBy for the normal PURL, and owl:VersionInfo for the date string ("2021-08-01") - this would be consistent with the way we do it for the ontology artefact as well.

@hkir-dev
Copy link
Contributor

hkir-dev commented Sep 1, 2021

We can use date string as versionInfo. But some common terms are redefined in multiple ontologies. For example, http://purl.obolibrary.org/obo/BFO_0000002 defined in both cl, pato and uberon. If I have an ontology that uses all of them, BFO_0000002 will have 3 versionInfo: "2021-08-10", "2021-08-06" and "2021-07-27". But we won't know which date from which source.

Can we set versionInfo with the string value of versionIRI. Then we will have:

<owl:versionInfo>http://purl.obolibrary.org/obo/pato/releases/2021-08-06/pato.owl</owl:versionInfo>
<owl:versionInfo>http://purl.obolibrary.org/obo/cl/releases/2021-08-10/cl.owl</owl:versionInfo>
<owl:versionInfo>http://purl.obolibrary.org/obo/uberon/releases/2021-07-27/uberon.owl</owl:versionInfo>

Ideally BFO_0000002 definitions in cl, pato and uberon should have a source and versionInfo referencing to the bfo version and we should use it. But in the current reality using only date as versionInfo may cause confusion.

@matentzn
Copy link
Contributor

matentzn commented Sep 1, 2021

I just wrote a huge paragraph about this and discarded it again because it will spawn a lot of cries of lament. Please don't work on this without having a meeting first - a lot hinges on getting this right, and we need to really debate the use of term level versioning vs statement level versioning if we want to move forward.

@dosumis
Copy link
Author

dosumis commented Sep 1, 2021

We can use date string as versionInfo. But some common terms are redefined in multiple ontologies. For example, http://purl.obolibrary.org/obo/BFO_0000002 defined in both cl, pato and ubero

It's actually just defined in RO but the release + import mechanism obscures this. In the longer term, this will be solved by using base files for imports. I'm not sure what the best short-term solution is.

At the risk of muddying the waters still further: One possible solution with the current infrastructure (not involving ROBOT extract) would be to annotate, during release, only terms declared in the editor's file with an rdfs:isDefinedBy tag + value latest version IRI. rdfs:isDefinedBy tags for third party ontologies added in the same way could then be inherited through import.

@balhoff
Copy link
Contributor

balhoff commented Sep 1, 2021

I think some PROV properties would work better than rdfs:isDefinedBy for extract, e.g. prov:wasDerivedFrom. I would like to be included in a discussion, if you meet. I implemented a version of this in Owlery.

@matentzn
Copy link
Contributor

matentzn commented Sep 1, 2021

Awesome @balhoff agreed, I will send out an invite for anyone who may be interested in this discussion; I think this property would be a good idea!

@matentzn
Copy link
Contributor

matentzn commented Feb 17, 2022

Alright, after a lot of back and forth, here the final spec for this ticket:

  • extend annotate with two parameters: --annotate-derived-from true & --annotate-defined-by true
  • --annotate-derived-from true: all OWLAxioms (regardless of whether they are logical or annotation assertions) should be annotated with the source's version IRI if it exists, else with its IRI, using prov:isDerivedFrom.
  • --annotate-defined-by true: All OWLEntities (classes, data, annotation, object properties and named individuals should be annotated using the ontology IRI, using rdfs:isDefinedBy.
  • extend robot merge to optionally annotate all merged ontologies the same way during a merge. (remember to consider the case of owl:imports vs --input parameter.

EDIT 1: if the axiom already has a "derived-from" annotation, do not add another one! We want to trace the source of an axioms potentially through multiple import chains. Similarly if there is already a "defined-by" annotation on the term, do not overwrite it, just leave it as is!

@dosumis @balhoff @cmungall @jamesaoverton please confirm that this final spec is acceptable to everyone and covers all use cases.

The default for both should always be "false" to maintain backward compatibility.

@jamesaoverton
Copy link
Member

The proposal makes sense. The ROBOT side should be easy enough. I have two questions:

  1. What happens when the version IRI is updated? How do you avoid tons of stale annotations to the old version? People who are not careful could make a big mess.

  2. Does someone have a proof-of-concept consumer program that uses the annotations successfully to do something real?

@matentzn
Copy link
Contributor

  1. What happens when the version IRI is updated? How do you avoid tons of stale annotations to the old version? People who are not careful could make a big mess.

Oh yeah, I totally forgot that. see EDIT 1 in my comment above. :) Thanks!

  1. Does someone have a proof-of-concept consumer program that uses the annotations successfully to do something real?

There are no tools right now, but we absolutely we do a lot of debugging using explanations, and we recently moved to the "base module import" approaches, which merges all dependencies together, so keeping track where which axioms has come from is imperative!

@shawntanzk
Copy link

@hkir-dev to handle this and @matentzn to review/implement in ODK

@matentzn matentzn changed the title robot extract --annotate-with-source should use version IRIs robot extract/merge should add optionally inject provenance Feb 28, 2022
@matentzn matentzn changed the title robot extract/merge should add optionally inject provenance robot extract/merge should optionally inject provenance Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants