[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

christian-oreilly · 2018-04-11T11:32:53Z

from nat.treeData import getChildren
print(list(getChildren("BIRNLEX:160").keys()))

produces

['BIRNLEX:421', 'BIRNLEX:266', 'NLXORG:20081201', 'BIRNLEX:498', 'BIRNLEX:160', 'BIRNLEX:710', 'BIRNLEX:202', 'BIRNLEX:211', 'BIRNLEX:254']

and

print(list(getChildren("NIFORG:birnlex_160").keys()))

produces

['NIFORG:nlx_organ_20081201', 'NIFORG:birnlex_211', 'NIFORG:birnlex_266', 'NIFORG:birnlex_498', 'NIFORG:birnlex_421', 'NIFORG:birnlex_710', 'NIFORG:birnlex_202', 'NIFORG:birnlex_254', 'NIFORG:birnlex_160']

The terms identified by "BIRNLEX:160" and "NIFORG:birnlex_160" are identical. These alternative ways to format the ID of ontological terms cause difficulties for basic operation like asking "Is this model organism (e.g., wistar rat) is a subclass of another model organism (e.g., rodent)." When looking if 'NIFORG:birnlex_211' is a subclass of "BIRNLEX:160" presently we get False when we would expect True. This is due to comparison of a given ID with a list of subclasses ID, but this list does not contain all possible alternatives ways to write the ID. We need to find a consistent and reliable way to check these equivalences. Most importantly, it has to be relatively efficient, e.g., systematic REST call to check for equivalences could quickly results in poor performances.

The text was updated successfully, but these errors were encountered:

christian-oreilly · 2018-04-11T11:34:49Z

@tgbugs Do you have any insights on the structure of NIFSTD or the scigraph client that could help us with respect to this issue?

tgbugs · 2018-04-11T19:15:58Z

I think this is the result of the fact that we transitioned the ontology away from the ontology.neuinfo.org identifiers to the uri.neuinfo.org identifiers. See SciCrunch/NIF-Ontology@b268a6b for details. I do not load the mapping file into SciGraph to avoid confusion, though in this case it seems to have caused some. I also have not tested whether SciGraph treats owl:sameAs correctly with regard to issuing queries against the graph, so there is a possibility that you would have to issue two SciGraph queries even if I did. I would suggest switching to the new uri scheme but totally understand the human readability needs. Therefore I suggest that you can load the mapping file into a python dict to do the translation and it will be performant. I might insert a translation shim that switches the representation of those identifiers whenever a call is made in or out of SciGraph. We use an equivalent implementation to do the translations in nginx for the resolver. One note is that you should not do this computationally by trying to replace prefixes because there are exceptions.

Also, the endpoint you have here https://github.com/BlueBrain/nat/blob/master/nat/treeData.py#L107 is no longer accessible, so I'm not entirely sure where that data is coming from. If you have hardcoded the IP to old matrix in your hosts file or something like that then you are almost certainly get stale data. If you want to switch to our maintained endpoint (which is now finally up) see (newly added) note in the readme https://github.com/SciCrunch/NIF-Ontology#using-nifstd and switch your query to

    api_key = os.environ['SCICRUNCH_API_KEY']
    baseKS = "http://scicrunch.org/api/1/"
    response = requests.get(baseKS + "/scigraph/graph/neighbors/" + 
                            root_id + "?direction=" + direction + 
                            "&depth=" + str(maxDepth) + 
                            "&project=%2A&blankNodes=false&" + relationshipType +
                            "&key=" + api_key)

Please let me know if this addresses the issue. Best!

pafonta · 2018-04-12T08:44:44Z

NB: For the endpoint, we have an open issue (#11). Due to several things, it has not yet been fixed.

christian-oreilly · 2018-04-12T09:01:05Z

Thanks @tgbugs for the info. It is very useful. I'll uses the mapping file you pointed us to implement explicit equivalences and avoid defining general rules on prefix equivalences due to exceptions.

christian-oreilly · 2018-10-05T12:38:19Z

@tgbugs I'm back working on things related to this issue. I went at https://github.com/SciCrunch/NIF-Ontology#using-nifstd and tried to create a key for the API but both https://scicrunch.org/register and https://scicrunch.org/account/developer are currently empty pages. Were you aware of that? Is that normal? When is the situation expected to be resolved?

tgbugs · 2018-10-05T13:13:46Z

Definitely not normal. It looks like the UCSD data center went down some time over night. It should be back up some time later today PDT. I will take a look at it when I get in later today and let you know.

christian-oreilly · 2018-10-05T13:14:43Z

OK, so I'll resume this work on Monday then. Thanks for the feedback @tgbugs

christian-oreilly added the enhancement label Apr 11, 2018

christian-oreilly self-assigned this Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

christian-oreilly commented Apr 11, 2018

christian-oreilly commented Apr 11, 2018

tgbugs commented Apr 11, 2018 •

edited

Loading

pafonta commented Apr 12, 2018

christian-oreilly commented Apr 12, 2018

christian-oreilly commented Oct 5, 2018

tgbugs commented Oct 5, 2018

christian-oreilly commented Oct 5, 2018

[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

[Ontologies] Find a reliable approach to manage ontology terms ID consistency #16

Comments

christian-oreilly commented Apr 11, 2018

christian-oreilly commented Apr 11, 2018

tgbugs commented Apr 11, 2018 • edited Loading

pafonta commented Apr 12, 2018

christian-oreilly commented Apr 12, 2018

christian-oreilly commented Oct 5, 2018

tgbugs commented Oct 5, 2018

christian-oreilly commented Oct 5, 2018

tgbugs commented Apr 11, 2018 •

edited

Loading