Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ontology release and snaphot seems to be out of phase with current state of ontology #95

Closed
krchristie opened this issue May 10, 2019 · 24 comments

Comments

@krchristie
Copy link

Hi,

I am trying to add annotations to a new GO term that I committed two weeks ago (stanza from go-edit.obo included below)

The term was added on 4/26/19, but when I try to use it, either in the existing model that already contains the other annotations from the paper, or in a new one, either using the form or the graph editor (either as an individual or as a process term), it is not available in the autocomplete.

How long should I expect to wait for a new term to be available?

thanks,

-Karen

[Term]
id: GO:0120197
name: mucociliary clearance
namespace: biological_process
def: "The respiratory system process driven by motile cilia on epithelial cells of the respiratory tract by which mucus and associated inhaled particles a\
nd pathogens trapped within it are moved out of the airways." [GOC:krc, PMID:24119105, PMID:27864314]
synonym: "MCC" RELATED [PMID:24119105, PMID:27864314]
synonym: "MCT" RELATED [PMID:28289722]
synonym: "mucociliary transport" EXACT [PMID:28289722]
is_a: GO:0003016 ! respiratory system process
is_a: GO:0003351 ! epithelial cilium movement involved in extracellular fluid movement
created_by: kchris
creation_date: 2019-04-26T16:16:50Z
@kltm
Copy link
Member

kltm commented May 10, 2019

@kltm
Copy link
Member

kltm commented May 11, 2019

NEO rebuilt and deployed...but the term does not seem to be available.
@balhoff Would you happen to know the status of this term?

@krchristie
Copy link
Author

Any progress on this?

I have added some other new terms that are also not available in Noctua (pull request:
geneontology/go-ontology@173040d)

GO:0120205 - photoreceptor proximal connecting cilium (CC)
creation_date: 2019-05-10T22:47:08Z

GO:0120206 - photoreceptor distal connecting cilium (CC)
creation_date: 2019-05-10T22:54:05Z

For good measure, I checked a term that was added by someone else and though I can see it in the ontology when I am up to date with origin master, I cannot use the term in Noctua. Here's the term and the relevant pull request:

GO:0140330 - xenobiotic detoxification by transmembrane export across the cell outer membrane (BP)
creation_date: 2019-05-03T10:35:57Z
geneontology/go-ontology@a371973

@balhoff
Copy link
Member

balhoff commented May 20, 2019

@kltm getting new GO terms requires a Minerva restart, correct?

@kltm
Copy link
Member

kltm commented May 20, 2019

@balhoff The restart was not the issue--it was restarted a week and a half ago with https://github.com/geneontology/noctua/issues/612#issuecomment-491466881, but the term mentioned was still not present in NEO.

I'll go with the model that there was some other issue upstream and try again today.

@balhoff
Copy link
Member

balhoff commented May 20, 2019

I'm confused about looking in NEO; these are GO terms, right?

@kltm
Copy link
Member

kltm commented May 20, 2019

The current load of NEO can be browsed here:
http://noctua-amigo.berkeleybop.org/amigo
http://noctua-amigo.berkeleybop.org/amigo/search/ontology (make sure to remove the GO filter)
It is a simple ontology of all entities that can be annotated to for GO-CAMs in Noctua, including about 100,000 non-GO items (GPs, etc.).

@balhoff
Copy link
Member

balhoff commented May 20, 2019

I usually just call NEO the GPs, vs. go-lego which imports both NEO and GO. I think the issue is that 'mucociliary clearance' is not in the go-lego release, but it is in snapshot. However I do see it in the go-plus release. These should not be different. But besides that issue, should Noctua be loading go-lego snapshot rather than release?

@kltm
Copy link
Member

kltm commented May 20, 2019

@balhoff Yes, I think you're on the right track here. Taking a quick look through the NEO repo (https://github.com/geneontology/neo), there is a lot wired for "current"; it wouldn't surprise me if that was true for its ontology use as well and would well explain the issues we're having.
@cmungall Would it be possible to get a quick audit/check to make sure that NEO is properly using snapshots?
(For your other question, I'd think that's generally true for any annotation system.)

@krchristie
Copy link
Author

How often do go-lego snapshots get produced? If it's less frequently than daily, I think that's a problem for a curation tool.

@kltm
Copy link
Member

kltm commented May 20, 2019

@krchristie It's daily. Currently, the NEO reload is semi-manual, with automation in the roadmap #35

@kltm
Copy link
Member

kltm commented May 25, 2019

@balhoff Okay, digging in a little with help from @cmungall . I believe the issue is that the go-lego my go-lego-based load is fixed on the release versions, rather than the snapshots. To solve this, I guess we'd either need to have better catalog control in a few places (there's a ticket...) or have a version of go-lego that used a snapshot instead.
For a third way, would you have a good mental model of what would happen if I just added the added the snapshot GO into owltools? There might be bits slightly out of sync (e.g. obsoleted terms), but maybe not too bad as an interim workaround? https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L53

@balhoff
Copy link
Member

balhoff commented May 27, 2019

@kltm you can load go-lego from http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl. This has all imports merged in, so it will not load GO from release.

kltm referenced this issue May 28, 2019
@kltm
Copy link
Member

kltm commented May 28, 2019

@balhoff Great, thank you--I've added as above and will test.

@kltm
Copy link
Member

kltm commented May 29, 2019

@balhoff Huh. I've done the run and deployed the product, but the term is still not available...
http://noctua-amigo.berkeleybop.org/amigo/search/ontology
Given what's being loaded as https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L53 , is it possible that they are clobbering each other out?

@balhoff
Copy link
Member

balhoff commented May 29, 2019

@kltm it doesn't make sense to me. I downloaded http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl just now and I see GO_0120197 in there.

@kltm
Copy link
Member

kltm commented May 29, 2019

Well, it seems that the changes went into the build as expected:
https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/13/
3748d68
That said, http://noctua-amigo.berkeleybop.org/amigo/load_details gives us a rather odd line:
2019-05-28 | 2019-05-29 | http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
which indicates that it still went back to the released version for some reason.
The running command would be like:

java \
    -Xms$LOADER_MEM \
    -Xmx$LOADER_MEM \
    -DentityExpansionLimit=8172000 \
    -Djava.awt.headless=true \
    -jar /srv/amigo/java/lib/owltools-runner-all.jar  \
    $ONTOLOGIES \
    --log-info \
    --solr-config /srv/amigo/metadata/ont-config.yaml \
    --merge-support-ontologies \
    --merge-imports-closure \
    --remove-subset-entities upperlevel \
    --remove-disjoints \
    --silence-elk \
    --reasoner elk \
    --solr-taxon-subset-name amigo_grouping_subset \
    --solr-eco-subset-name go_groupings \
    --solr-url http://localhost:8080/solr/ \
    --solr-log /tmp/golr_timestamp.log \
    --solr-load-ontology \
    --solr-load-ontology-general  \
    --solr-optimize

The environment variable dies indeed seem to be getting through to at least the outer layers:
[2019-05-28T21:41:34.298Z] GOLR_INPUT_ONTOLOGIES=http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl http://purl.obolibrary.org/obo/eco.owl http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl/cl-basic.owl http://purl.obolibrary.org/obo/go/extensions/gorel.owl http://purl.obolibrary.org/obo/pato.owl http://purl.obolibrary.org/obo/po.owl http://purl.obolibrary.org/obo/chebi.owl http://purl.obolibrary.org/obo/uberon/basic.owl http://purl.obolibrary.org/obo/wbbt.owl http://purl.obolibrary.org/obo/go/extensions/go-modules-annotations.owl http://purl.obolibrary.org/obo/go/extensions/go-taxon-subsets.owl

Noting: ONTOLOGIES=${GOLR_INPUT_ONTOLOGIES:= \

@balhoff
Copy link
Member

balhoff commented May 30, 2019

@kltm is that line printing the ontology IRI? If so, that is expected because the snapshot has the same ontology IRI as the release.

@kltm
Copy link
Member

kltm commented May 31, 2019

I went through and compared the availability of terms across different releases that I have access to.
I'd note that the last release was 2019-05-09.

| term\release            | release | snapshot | neo |
|-------------------------+---------+----------+-----|
| GO:0120197 (2019-04-28) | N       | Y        | Y   |
| GO:0120205 (2019-05-12) | N       | N        | N   |
| GO:0120206 (2019-05-12) | N       | N        | N   |
| GO:0140330 (2019-05-09) | N       | Y        | Y   |

So @balhoff , the example of GO_0120197 may not be a good one for whatever reason.
This is disturbing in a couple of ways. The first is that GO:0120197 apparently does not show up in the release, even though it was a few days old. Maybe that's normal? I don't have a mechanism for that. On the happy side, snapshot and neo seems to be in sync with at least the availability of terms. Next, whatever is wrong with the neo/go-lego load seems to be the same problem in snapshot, a problem we didn't know we had before.

Given that this is now possibly a general snapshot problem and not just go-lego/neo problem, let's try some ideas for what is going wrong:

  • something is wrong with snapshot ontology distribution, possibly in CF
  • something is wrong with snapshot redirect URLs
  • something is wrong how the snpashot "materialized" ontology is built
  • something is wrong with the build environment for the Solr index in owltools, possibly the environmental variables
  • something is wrong with owltools itself, maybe with how it brings in and merges ontologies (see https://github.com/geneontology/noctua/issues/612#issuecomment-497121664)
  • something is wrong with the way that products propagate out of the pipeline (upstream version of 1)

@balhoff , I can start on the fourth one there, and maybe start trying the fifth if we see nothing. Would you mind trying a few things around the first three? The release is tomorrow, so if you have other terms that you'd like the keep track of while they propagate, it might be good to mark them here now.

@kltm
Copy link
Member

kltm commented Jun 3, 2019

Updated the table able with new number a couple of days after the 2019-06-01 release:

| term\release            | release | snapshot | neo |
|-------------------------+---------+----------+-----|
| GO:0120197 (2019-04-28) | Y       | Y        | Y   |
| GO:0120205 (2019-05-12) | N       | Y        | Y   |
| GO:0120206 (2019-05-12) | N       | Y        | Y   |
| GO:0140330 (2019-05-09) | Y       | Y        | Y   |

Okay, this is interesting and worrying. Without "proof", it looks like the old snapshot is finally in the new release, and what we expected to to be in snapshot all along has finally gotten there.

@balhoff If something like that is correct, it would seem that we are not understanding something about ontology propagation within our system. If you're free at some point, I'd like to talk this over with you.
As another example, we can look at the obsolete of a term geneontology/go-ontology#17214

| term\release                | release | snapshot | neo |
|-----------------------------+---------+----------+-----|
| GO:0005395 (OBS 2019-05-22) | N       | Y        | Y   |

Again, it seems like release is hung up on a previous state, possibly until the next release?

@kltm
Copy link
Member

kltm commented Jun 3, 2019

While we have technically "solved" this issue, I'm hijacking it for the general case.

@kltm kltm changed the title new term dated 4/26/19 not available in autocomplete Ontology release and snaphot seems to be out of phase with current state of ontology Jun 3, 2019
@kltm kltm transferred this issue from geneontology/noctua Jun 3, 2019
@kltm
Copy link
Member

kltm commented Jun 3, 2019

Noting that #95 (comment) would also be consistent with an env/owltools issue--I'll start pulling that apart next.

@cmungall
Copy link
Member

cmungall commented Jun 4, 2019

Nothing to do with owltools, as we don't use owltools for the merge

I have tracked to an issue with robot: ontodev/robot#493

balhoff added a commit to geneontology/go-ontology that referenced this issue Jun 4, 2019
@kltm
Copy link
Member

kltm commented Jun 4, 2019

@cmungall To clarify, the theory was not that it was owltools merge, but that the docker environment was not correctly picking up the external variable and using fallbacks, which would them point them to the last release...or something. The debugging that started was making the environment more verbose about what it actually contained. After that I would be back to owltools itself and water sprites if Jim had not found anything at his end. It sounds like it has worked out though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants