Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refining the Uberon Euarchontoglires subset #2050

Closed
6 tasks done
paolaroncaglia opened this issue Sep 15, 2021 · 20 comments · Fixed by #2195
Closed
6 tasks done

Refining the Uberon Euarchontoglires subset #2050

paolaroncaglia opened this issue Sep 15, 2021 · 20 comments · Fixed by #2195
Assignees
Labels
HCA Human Cell Atlas

Comments

@paolaroncaglia
Copy link
Contributor

paolaroncaglia commented Sep 15, 2021

Stemming from HumanCellAtlas/ontology#84. I scanned the Uberon Euarchontoglires subset to search for non-primate, non-rodent, non-rabbit terms. Note, this was a quick search and not meant to be exhaustive. Here are a few possible strategies to identify and filter out unwanted classes in a time-effective way. These would improve the subset quality and should help make the subset more manageable (it currently has >20k classes):

  • Search for annotation property ‘never in taxon’ ‘Homo sapiens’ (e.g. ‘ampullary gland’, ‘bone of reproductive organ’). Some terms may need double-checking in case they are found in non-human Euarchontoglires.

  • I spotted many fish-specific terms. Searching for created_by teleost_anatomy_curators should retrieve them all (e.g. 'fin fold pectoral fin bud’). Their IDs all start with UBERON:2

  • Same for amphibians: search by created_by amphibian_anatomy_curators (e.g. 'pars amphibiorum’). Their IDs all start with UBERON:3

  • Insect terms may be retrieved by searching for database_cross_reference contains FBbt (e.g.: 'egg chorion’). This will also bring up a dozen terms whose label contains “insect” (e.g. 'insect ring gland’).

  • Other random finds:
    exoskeleton
    shell
    'open circulatory system'
    feather
    rhinarium?
    honey

  • Investigate adding taxon constraints en masse (I recall there was a similar comment from Chris some time ago somewhere; found it: Human subset uberon 0 byte file? #1824 (comment))

Thanks,
Paola

@dosumis
Copy link
Contributor

dosumis commented Sep 20, 2021

Hi @paolaroncaglia - are you clear on how to add taxon constraints in these cases, or would you like to discuss?

@paolaroncaglia
Copy link
Contributor Author

@dosumis

Hi @paolaroncaglia - are you clear on how to add taxon constraints in these cases, or would you like to discuss?

I might ask @matentzn for help please with adding constraints in bulk, but I'll take a closer look and will then contact him offline if he's available. I'll update the ticket as we go - aiming to have the slim ready to integrate in the HCAO release pipeline for the Oct 19th release.

@paolaroncaglia
Copy link
Contributor Author

paolaroncaglia commented Sep 23, 2021

@cmungall wrote in #1824 (comment):

"UBERON:6x --> these are all specific to arthropods, and we can add TCs to 6x roots en masse
UBERON:2x --> likely teleost but not universally. I am comfortable adding never-in mammal en masse to existential roots
UBERON:3x --> likely amphibia but not universally. I am comfortable adding never-in mammal en masse to existential roots
UBERON:4x --> mixed, I think this is a small enough subset to look over quickly and add TCs. There will be a handful applicable to human"

So I suggest a first pass including the following steps:
(NOTE - classes were searched in the slim, but TCs must be added to uberon-edit.obo!)

  • Add subclass 'only in taxon' some Arthropoda to all UBERON:6x classes
    In fact, there are only 14 non-obsolete UBERON:6x classes in the slim. Here's a spreadsheet.
    Update: it was resolved to mass-add TCs to all UBERON:6x found in the slim.

  • Scan UBERON:2x classes quickly. Flag any that may be relevant to Euarchontoglires. Add annotation 'never in taxon' Mammalia to all others.
    There are 1120 non-obsolete UBERON:2x classes in the slim, copied here.
    If I search the HCA OLS production instance for UBERON:2******, I retrieve 33 terms, but it's not clear why they're there - I checked 2, they don't have FMA xrefs, but their ancestor has? Either way, keep those 33 terms?
    Update: it was resolved to mass-add TCs to all UBERON:2x found in the slim.

  • Scan UBERON:3x classes quickly. Flag any that may be relevant to Euarchontoglires. Add annotation 'never in taxon' Mammalia to all others.
    There are 639 non-obsolete UBERON:3x classes in the slim, copied here. If I search the HCA OLS production instance for UBERON:3******, I retrieve 8 terms, but it's not clear why they're there - I checked 3, they don't have FMA xrefs. Either way, keep those 8 terms?
    Update: it was resolved to mass-add TCs to all UBERON:3x found in the slim.

  • Scan UBERON:4x quickly. Add TCs as appropriate.
    There are 369 non-obsolete UBERON:4x classes in the slim, copied here. If I search the HCA OLS production instance for UBERON:4******, I retrieve 13 terms, they might be there as subclasses of terms that have FMA xrefs. Either way, keep those 13 terms?
    Update: it was resolved to mass-add TCs to all UBERON:4x found in the slim.

Then, a second pass later on, including the following steps:

  • Terms that were added TCs may be scanned for FMA xrefs, and brought back into the slim if they have FMA xrefs.
  • Are there any logically defined non-human terms left? Search for annotation property ‘never in taxon’ ‘Homo sapiens’ (e.g. ‘ampullary gland’, ‘bone of reproductive organ’). Some terms may need double-checking in case they are found in non-human Euarchontoglires.
  • Are there any insect terms left? Insect terms may be retrieved by searching for database_cross_reference contains FBbt (e.g.: 'egg chorion’). This will also bring up a dozen terms whose label contains “insect” (e.g. 'insect ring gland’).
  • Are the following random finds left?
    exoskeleton
    shell
    'open circulatory system'
    feather
    rhinarium?
    honey

Notes to self:
Guidelines on adding TCs here.
Jim Balhoff wrote "You all may find my OBO taxon constraints plugin for Protégé useful in seeing what effective taxon constraint the reasoner knows about for the selected term (and you can get explanations). Just keep in mind that it can be SLOW. :-) "

@dosumis
Copy link
Contributor

dosumis commented Sep 26, 2021

@paolaroncaglia - nice work! It should be straightforward to add these via ROBOT templates. This can be done working directly from your google sheets.

@rays22 - can you have a go at working with Paola to do this? Happy to help if you run into problems. We basically just need an ID column and one or two columns for adding taxon constraints. The results can then be merged into the editor's file with robot merge - chained to robot convert to ensure the results is in obo format.

@paolaroncaglia
Copy link
Contributor Author

Hi @dosumis and @rays22 ,
Before adding TCs en masse, it'd be good to get feedback about those UBERON:2x, UBERON:3x and UBERON:4x classes above, please. I've added to today's meeting agenda.
I don't have ROBOT installed on my old laptop (going to buy a new one soon, but this ticket shouldn't wait), so once the TCs are sorted, I'd leave it to someone else please to add them via ROBOT. Thanks!

@paolaroncaglia paolaroncaglia added the HCA Human Cell Atlas label Sep 28, 2021
@paolaroncaglia
Copy link
Contributor Author

paolaroncaglia commented Oct 1, 2021

Update: at the Uberon editors meeting earlier this week, it was resolved to mass-add TCs to all UBERON:2x, 3x, 4x and 6x terms found in the slim as above. TCs should be added to uberon-edit.obo. Then, at a later date, terms that were added TCs may be scanned for FMA xrefs, and brought back into the slim if they have FMA xrefs. We can refer to the comment above for a to-do list.

@paolaroncaglia
Copy link
Contributor Author

Back to the set of Uberon terms with ID starting with 4: Chris wrote:
"UBERON:4x --> mixed, I think this is a small enough subset to look over quickly and add TCs. There will be a handful applicable to human"
As mentioned, there are 369 non-obsolete UBERON:4x classes in the slim, copied here. I inspected 10 random terms among the 369, and at least 3 are applicable to human, so I'm not very comfortable with adding TCs en masse to this set. @dosumis what would you prefer: a) we retain all 369 terms in the slim; b) or is there someone who could take a closer look and advice on what terms are safe to restrict to never in mammals? Perhaps Wasila, who authored some terms in this set? Thanks.

@paolaroncaglia
Copy link
Contributor Author

@dosumis @rays22
The spreadsheet is ready for ROBOT, pending

  • A decision on the UBERON:4x set (last sheet in the spreadsheet)
  • Extraction of class ID from the first column, if that's the format you need
  • Checking that the action item column is in the right format for ROBOT.

As I said I can't really help with ROBOT at the moment, but let me know if I can do anything else for this ticket.
Once the TCs are added, who will re-build the slim?

Thanks and have a good weekend.

rays22 added a commit that referenced this issue Oct 18, 2021
This commit intends to
add tax constraint 'in taxon' some Arthropoda
using a ROBOT template.
NOTE that there are lots of diffs in the `uberon-edit.obo` file that are not explicitly related to the intended tax constraints.
This commit addresses #2050 in part.
rays22 added a commit that referenced this issue Oct 18, 2021
This commit intends to
add annotation `never in taxon` Mammalia (NCBITaxon:40674).
If applied, this commit will fix #2050.
@rays22
Copy link
Collaborator

rays22 commented Oct 18, 2021

I used this ROBOT template to add taxon constraints:

Class ID	Label	rdf:type	Action item: add taxon constraint 
ID	LABEL	TYPE	SC %
RO:0002162	in taxon	owl:ObjectProperty	
NCBITaxon:6656	Arthropoda	owl:Class	
UBERON:6003006	adult segment	owl:Class	('in taxon' some Arthropoda)
UBERON:6000000	embryonic germ layer derivative	owl:Class	('in taxon' some Arthropoda)
UBERON:6000154	embryonic segment	owl:Class	('in taxon' some Arthropoda)
UBERON:6005541	insect cardiogenic mesoderm	owl:Class	('in taxon' some Arthropoda)
UBERON:6005168	insect external sensory organ	owl:Class	('in taxon' some Arthropoda)
UBERON:6000104	insect mesoderm anlage	owl:Class	('in taxon' some Arthropoda)
UBERON:6000132	insect mesodermal crest of segment T3	owl:Class	('in taxon' some Arthropoda)
UBERON:6000131	insect mesodermal crest	owl:Class	('in taxon' some Arthropoda)
UBERON:6001722	insect ring gland	owl:Class	('in taxon' some Arthropoda)
UBERON:6005436	insect trunk mesoderm anlage	owl:Class	('in taxon' some Arthropoda)
UBERON:6026000	insect trunk mesoderm derivative	owl:Class	('in taxon' some Arthropoda)
UBERON:6000128	insect trunk mesoderm	owl:Class	('in taxon' some Arthropoda)
UBERON:6026002	insect visceral mesoderm derivative	owl:Class	('in taxon' some Arthropoda)
UBERON:6000130	insect visceral mesoderm	owl:Class	('in taxon' some Arthropoda)

2021-10-18: I used two ROBOT commands in tandem:

  1. Merging the ROBOT template with uberon-edit.obo:
robot template --merge-before --input uberon-edit.obo --template ~/Downloads/uberon6x_robot.tsv --output uberon-edit-2.owl
  1. Format conversion:
robot convert --check false --input uberon-edit-2.owl --format obo --output uberon-edit.obo

@rays22
Copy link
Collaborator

rays22 commented Oct 18, 2021

To add annotations 'not in taxon' Mammalia, I used a ROBOT template similar to this one:

subject	object
ID	AI 'never in taxon'
UBERON:2000000	NCBITaxon:40674
...

rays22 added a commit that referenced this issue Oct 18, 2021
Remove the taxon annotation of obsolete term
UBERON:6000000  obsolete embryonic germ layer derivative
If applied, this commit will fix #2050.
@dosumis
Copy link
Contributor

dosumis commented Oct 19, 2021

I think the robot commands look good, but please check you are not losing header/ontology level content. Note - you can chain commands ROBOT, so format conversion could be folded in to the first command if you want.

rays22 added a commit that referenced this issue Oct 19, 2021
This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template.

If applied, this commit will address #2050.

### Notes
* I had to delete the following 2 terms from the ROBOT template, because they cause OBO format `multiple name tags not allowed` errors:

```
UBERON:6003006  adult segment   owl:Class   ('in taxon' some Arthropoda)
UBERON:6000154  embryonic segment   owl:Class   ('in taxon' some Arthropoda)
```

* I also had to delete the following term from the ROBOT template, because it appears to be obsolete and raises validation errors:

```
UBERON:6000000  embryonic germ layer derivative owl:Class   ('in taxon' some Arthropoda)
```
rays22 added a commit that referenced this issue Oct 19, 2021
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template.

If applied, this commit will address #2050.
@rays22
Copy link
Collaborator

rays22 commented Oct 19, 2021

In the latest pull request I am using two ROBOT templates:

  1. uberon2x-3x_robot.tsv.csv
  2. uberon6x_robot.tsv.csv
    together with commands
robot template --merge-before --input uberon-edit.obo --template uberon6x_robot.tsv.csv --output uberon-edit.obo
robot template --merge-before --input uberon-edit.obo --template uberon2x-3x_robot.tsv.csv--output uberon-edit.obo

.

rays22 added a commit that referenced this issue Oct 19, 2021
This commit intends to avoid the
11 unsatisfiable classes from the previous commit by excluding these terms from the ROBOT template `uberon2x-3x_robot.tsv`:
UBERON:3000961 'external integument structure' -- 1
UBERON:2001626 'premaxillary tooth' -- 2
UBERON:2001457 'postcranial axial cartilage' -- 6
UBERON:2005260 'fenestrated capillary' -- 1
UBERON:2001995 'papilla' -- 1
If applied, this commit will address #2050.
rays22 added a commit that referenced this issue Oct 20, 2021
This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template,
3. add provenance for the changes.

If applied, this commit will address #2050.
rays22 added a commit that referenced this issue Oct 20, 2021
This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template,
3. add provenance for the changes.

If applied, this commit will address #2050.
rays22 added a commit that referenced this issue Oct 20, 2021
* This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template.

If applied, this commit will address #2050.

* Fix unsatisfiable classes

This commit intends to avoid the
11 unsatisfiable classes from the previous commit by excluding these terms from the ROBOT template `uberon2x-3x_robot.tsv`:
UBERON:3000961 'external integument structure' -- 1
UBERON:2001626 'premaxillary tooth' -- 2
UBERON:2001457 'postcranial axial cartilage' -- 6
UBERON:2005260 'fenestrated capillary' -- 1
UBERON:2001995 'papilla' -- 1
If applied, this commit will address #2050.

* Add taxon constraints with provenance

This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template,
3. add provenance for the changes.

If applied, this commit will address #2050.

* Add taxon constraints with provenance

This commit intends to
1.  add taxon constraint `in taxon` *some* **Arthropoda** to some terms
using a ROBOT template,
2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template,
3. add provenance for the changes.

If applied, this commit will address #2050.

* Update uberon-edit.obo

Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
@rays22
Copy link
Collaborator

rays22 commented Oct 20, 2021

I used these two templates:

  1. 'in taxon' some Arthropoda ROBOT_TEMPLATE-1.tsv
  2. 'never in taxon' Mammalia ROBOT_TEMPLATE-2.tsv
robot template --merge-before --input uberon-edit.obo --template ROBOT_TEMPLATE.tsv --output uberon-edit.obo

@rays22
Copy link
Collaborator

rays22 commented Oct 20, 2021

Remaining tasks

UBERON:4x --> mixed, I think this is a small enough subset to look over quickly and add TCs. There will be a handful applicable to human"

@paolaroncaglia ,
Shall I make a new ticket with the remaining tasks and close this one?

@paolaroncaglia
Copy link
Contributor Author

@rays22 sure, thanks!

@matentzn
Copy link
Contributor

Please start making a library of reusable templates in src/templates

@rays22
Copy link
Collaborator

rays22 commented Oct 25, 2021

I have made a new issue with the remaining tasks.

@rays22 rays22 closed this as completed Oct 25, 2021
@paolaroncaglia
Copy link
Contributor Author

Thanks @rays22 .

@paolaroncaglia
Copy link
Contributor Author

Great job @rays22 with help from @matentzn, @dosumis and more!
To whoever runs the next Uberon release, please: we should probably write in the release notes that the new release contains ~1759 additional taxon constraints "never in taxon Mammalia" (plus ~13 in taxon Arthropoda), and point to this ticket for details if desired. I'll add to the agenda for today's meeting too. Thanks.

@rays22
Copy link
Collaborator

rays22 commented Nov 26, 2021

As part of a roll back, I am going to remove the UBERON:3x bulk never in taxon Mammalia taxon constraints that were part of this ticket. Some of the 2x and 3x classes appear to overlap with Mammalia.

@rays22 rays22 reopened this Nov 26, 2021
rays22 added a commit that referenced this issue Nov 26, 2021
This commit intends to
roll back the 3x taxon constraints that were part of this ticket.
The bulk addition of taxon constraints appear to include classes that clash with classes that genuinely overlap with *Mammalia*.
See #2127 (comment) .
If applied, this commit will fix #2050.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HCA Human Cell Atlas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants