Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track information about edit files #2422

Closed
cthoyt opened this issue Aug 23, 2023 · 5 comments
Closed

Track information about edit files #2422

cthoyt opened this issue Aug 23, 2023 · 5 comments
Labels
ontology metadata Issues related to ontology metadata

Comments

@cthoyt
Copy link
Collaborator

cthoyt commented Aug 23, 2023

Problem

Contribution to OBO foundry ontologies would be easier with two pieces of structured information (which basically never changes):

Which file should I edit?

What is URL pointing to the "edit" file for the ontology. In ODK formatted ontologies, this is usually pretty obvious since it lives in src/ontology/{NAME}-edit.{EXTENSION} such as https://github.com/obophenotype/uberon/blob/master/src/ontology/uberon-edit.obo for UBERON. However, some ontologies version control both e.g., an OBO flat file and an OWL file and don't make it clear which one should be contributed to.

What format is the file?

What is the format of the edit file (OBO flat file format, Functional OWL, RDF/XML). This can't be inferred from the extension in all cases since ontologies encoded in Functional OWL often has .owl extension instead of .ofn (e.g., CL https://raw.githubusercontent.com/obophenotype/cell-ontology/master/src/ontology/cl-edit.owl)

This would be particularly helpful for making automated contributions.

Solution

I propose adding a new optional field to the ontology metadata that would look like this:

edit:
    path: src/ontology/apollo_sv-edit.owl
    format: ofn

Where format has to come from an enumeration (suggestions welcome on what the entries in this enumeration would be).

It's also up for debate whether the path should be relative within the repository or absolute with a URL. I prefer relative since the information about the repository is already available elsewhere in the schema and this is more robust towards changes in the name or organization of the repository.

Ideally, all OBO Foundry ontologies version control their ontology in a way such that it can be contributed to, so this annotation can cover 99% of OBO Foundry ontologies. One caveat pointed out by Nico is that some ontologies partially use curation templates, which means there could potentially be multiple places to contribute. In these cases, hopefully there are carefully written contribution guidelines from the maintainers!

How to accomplish this

I can volunteer to do the following:

  1. Update the schema
  2. Add unit tests
  3. Write a script that infers this information where possible and automatically add it to the correct metadata markdown files
  4. Update website format to show this information (if desired)

Afterthoughts

My personal goal is to make it easier to automate contributing high-quality semantic mappings curated in Biomappings to upstream ontologies. I've already started writing a workflow in biopragmatics/biomappings#149 and made a contribution in obophenotype/uberon#2950, but this requires a lot of manual configuration (which I would like to structure and upstream into OBO Foundry with this proposal)

More generally, this kind of annotation makes it more transparent for OBO Foundry ontologies how to contribute, which is already implicit in the OBO Foundry Principle 10 "Commitment to Collaboration"

@cthoyt cthoyt added the ontology metadata Issues related to ontology metadata label Aug 23, 2023
@matentzn
Copy link
Contributor

I think this is a good idea. Two caveats:

  1. Some ontologies (e.g. VBO, OBA, UPHENO, OBI, and others) are partially or mainly curated using template TSVs, which means it may not always be clear where a PR would be made. In this sense, there are multiple editors sources.
  2. Curating this information will require a significant about of manual labour.

@cmungall
Copy link
Contributor

cmungall commented Sep 5, 2023

I worry about packing too much information in to the OBO metadata here.

But the motivation to make it easier to contribute is good. We should focus on a standard top-level metadata file that is linked from OBO. This could be used by both humans and agents (e.g. kgcl/ontobot - cc @hrshdhgd)

As an intermediate measure I think it would be fine to have a field in OBO to link to a CONTRIBUTING.md, and to encourage a standard layout in these files

@matentzn
Copy link
Contributor

matentzn commented Sep 5, 2023

During the OBO operations call today we (20 members) agreed that this metadata is out of scope for OBO metadata for various reasons, including:

  1. Many ontologies have multiple editors files, maybe up to 20. For every kind of change, you would have to know where to look
  2. This kind of information should live in CONTRIBUTING.md. @cmungall suggests to make an ODK issue that we can use to define a kind of "MANIFEST" file that lists all the editors (source) files, which we could deposit in the repos top level, this would help automated agents like KGCL etc to make changes as well.

@cthoyt I will close this issue now, but if you digg the idea of a MANIFEST file, you can open an issue to here to discuss it: https://github.com/INCATools/ontology-development-kit/issues

With ODK, it should be quite easy to generate such a file, which pushes the responsibility for managing this kind of metadata directly to the ontology curators.

@matentzn matentzn closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2023
@cthoyt
Copy link
Collaborator Author

cthoyt commented Sep 5, 2023

I understand why the OBO operations group might be hesitant. Unfortunately, most repositories in OBO Foundry don't have a CONTRIBUTING.md. Further, most ontologies are unresponsive to any proactive updates to include more information from the ground up (remember how slow #2149 was, even when we made PRs?). Therefore I am still leaning towards central ways of curating this kind of content. Maybe it ends up in the Bioregistry, but then it's sad since it is not directly available to potential OBO Foundry website readers

@matentzn
Copy link
Contributor

matentzn commented Sep 9, 2023

Just to say my piece here: Edit files are constantly changing. Ontologies like Uberon, CL, OBI, and Mondo have dozens of edit sources. They change on a quarterly basis (new components are added, others removed). I doubt we can realise this beautiful heaven you have in mind where there is a single file we can have a bot make edits to reliably. At best we can have that for core ontology metadata like license/description, but if we are restricted to that sort of metadata, the value of curating all the edit files becomes too low IMO..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ontology metadata Issues related to ontology metadata
Projects
None yet
Development

No branches or pull requests

3 participants