-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refining the Uberon Euarchontoglires subset #2050
Comments
Hi @paolaroncaglia - are you clear on how to add taxon constraints in these cases, or would you like to discuss? |
I might ask @matentzn for help please with adding constraints in bulk, but I'll take a closer look and will then contact him offline if he's available. I'll update the ticket as we go - aiming to have the slim ready to integrate in the HCAO release pipeline for the Oct 19th release. |
@cmungall wrote in #1824 (comment): "UBERON:6x --> these are all specific to arthropods, and we can add TCs to 6x roots en masse So I suggest a first pass including the following steps:
Then, a second pass later on, including the following steps:
Notes to self: |
@paolaroncaglia - nice work! It should be straightforward to add these via ROBOT templates. This can be done working directly from your google sheets. @rays22 - can you have a go at working with Paola to do this? Happy to help if you run into problems. We basically just need an ID column and one or two columns for adding taxon constraints. The results can then be merged into the editor's file with robot merge - chained to robot convert to ensure the results is in obo format. |
Hi @dosumis and @rays22 , |
Update: at the Uberon editors meeting earlier this week, it was resolved to mass-add TCs to all UBERON:2x, 3x, 4x and 6x terms found in the slim as above. TCs should be added to uberon-edit.obo. Then, at a later date, terms that were added TCs may be scanned for FMA xrefs, and brought back into the slim if they have FMA xrefs. We can refer to the comment above for a to-do list. |
Back to the set of Uberon terms with ID starting with 4: Chris wrote: |
@dosumis @rays22
As I said I can't really help with ROBOT at the moment, but let me know if I can do anything else for this ticket. Thanks and have a good weekend. |
This commit intends to add tax constraint 'in taxon' some Arthropoda using a ROBOT template. NOTE that there are lots of diffs in the `uberon-edit.obo` file that are not explicitly related to the intended tax constraints. This commit addresses #2050 in part.
This commit intends to add annotation `never in taxon` Mammalia (NCBITaxon:40674). If applied, this commit will fix #2050.
I used this ROBOT template to add taxon constraints:
2021-10-18: I used two ROBOT commands in tandem:
|
To add annotations
|
Remove the taxon annotation of obsolete term UBERON:6000000 obsolete embryonic germ layer derivative If applied, this commit will fix #2050.
I think the robot commands look good, but please check you are not losing header/ontology level content. Note - you can chain commands ROBOT, so format conversion could be folded in to the first command if you want. |
This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template. If applied, this commit will address #2050. ### Notes * I had to delete the following 2 terms from the ROBOT template, because they cause OBO format `multiple name tags not allowed` errors: ``` UBERON:6003006 adult segment owl:Class ('in taxon' some Arthropoda) UBERON:6000154 embryonic segment owl:Class ('in taxon' some Arthropoda) ``` * I also had to delete the following term from the ROBOT template, because it appears to be obsolete and raises validation errors: ``` UBERON:6000000 embryonic germ layer derivative owl:Class ('in taxon' some Arthropoda) ```
1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template. If applied, this commit will address #2050.
In the latest pull request I am using two ROBOT templates:
. |
This commit intends to avoid the 11 unsatisfiable classes from the previous commit by excluding these terms from the ROBOT template `uberon2x-3x_robot.tsv`: UBERON:3000961 'external integument structure' -- 1 UBERON:2001626 'premaxillary tooth' -- 2 UBERON:2001457 'postcranial axial cartilage' -- 6 UBERON:2005260 'fenestrated capillary' -- 1 UBERON:2001995 'papilla' -- 1 If applied, this commit will address #2050.
This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template, 3. add provenance for the changes. If applied, this commit will address #2050.
This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template, 3. add provenance for the changes. If applied, this commit will address #2050.
* This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template. If applied, this commit will address #2050. * Fix unsatisfiable classes This commit intends to avoid the 11 unsatisfiable classes from the previous commit by excluding these terms from the ROBOT template `uberon2x-3x_robot.tsv`: UBERON:3000961 'external integument structure' -- 1 UBERON:2001626 'premaxillary tooth' -- 2 UBERON:2001457 'postcranial axial cartilage' -- 6 UBERON:2005260 'fenestrated capillary' -- 1 UBERON:2001995 'papilla' -- 1 If applied, this commit will address #2050. * Add taxon constraints with provenance This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template, 3. add provenance for the changes. If applied, this commit will address #2050. * Add taxon constraints with provenance This commit intends to 1. add taxon constraint `in taxon` *some* **Arthropoda** to some terms using a ROBOT template, 2. add taxon restriction annotation `never in taxon` **Mammalia** (NCBITaxon:40674) to some terms using another ROBOT template, 3. add provenance for the changes. If applied, this commit will address #2050. * Update uberon-edit.obo Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
I used these two templates:
|
Remaining tasks
@paolaroncaglia , |
@rays22 sure, thanks! |
Please start making a library of reusable templates in src/templates |
I have made a new issue with the remaining tasks. |
Thanks @rays22 . |
Great job @rays22 with help from @matentzn, @dosumis and more! |
As part of a roll back, I am going to remove the UBERON:3x bulk |
This commit intends to roll back the 3x taxon constraints that were part of this ticket. The bulk addition of taxon constraints appear to include classes that clash with classes that genuinely overlap with *Mammalia*. See #2127 (comment) . If applied, this commit will fix #2050.
Stemming from HumanCellAtlas/ontology#84. I scanned the Uberon Euarchontoglires subset to search for non-primate, non-rodent, non-rabbit terms. Note, this was a quick search and not meant to be exhaustive. Here are a few possible strategies to identify and filter out unwanted classes in a time-effective way. These would improve the subset quality and should help make the subset more manageable (it currently has >20k classes):
Search for annotation property ‘never in taxon’ ‘Homo sapiens’ (e.g. ‘ampullary gland’, ‘bone of reproductive organ’). Some terms may need double-checking in case they are found in non-human Euarchontoglires.
I spotted many fish-specific terms. Searching for created_by teleost_anatomy_curators should retrieve them all (e.g. 'fin fold pectoral fin bud’). Their IDs all start with UBERON:2
Same for amphibians: search by created_by amphibian_anatomy_curators (e.g. 'pars amphibiorum’). Their IDs all start with UBERON:3
Insect terms may be retrieved by searching for database_cross_reference contains FBbt (e.g.: 'egg chorion’). This will also bring up a dozen terms whose label contains “insect” (e.g. 'insect ring gland’).
Other random finds:
exoskeleton
shell
'open circulatory system'
feather
rhinarium?
honey
Investigate adding taxon constraints en masse (I recall there was a similar comment from Chris some time ago somewhere; found it: Human subset uberon 0 byte file? #1824 (comment))
Thanks,
Paola
The text was updated successfully, but these errors were encountered: