Task 3 - API - add search models by species #290

lpalbou · 2020-03-18T16:26:16Z

Task requirement from Noctua Landing Page Project

This will be a three steps task:

Batch update of all current models to add taxon id of each gene
Update minerva so that anytime a model is added or updated, the taxon ids will be added to each gene of the model
Provide the API route for NLP UI

Also linked to #230

lpalbou · 2020-03-18T17:30:14Z

@goodb what is the status of the new neo, we can discuss this requirement further

goodb · 2020-03-22T22:38:08Z

@lpalbou the status for minerva.. is that I am about to put in PR into the dev branch that provides search by taxon id among several other things.

I accomplished this by (1) building a taxon to models map on server startup (2) using that to handle the search (3) updating that map when models are saved.

I didn't touch the model metadata but could of course do it that way as well.

Still don't know what the NLP UI is or how it is intended to interact here. It seems my previous comment to that effect was lost in some kind of issue reshuffle.

lpalbou · 2020-03-26T21:19:51Z

Update from recent discussions:

@tmushayahama is able to power both the search and the browse models by species
@vanaukenk is testing and looks good at the moment
@goodb I like the taxon <-> model map solution you implemented and as long as it gets updated whenever a model is saved, it should be fine and fast enough. However, one query is taking a long time (~15s):

http://barista-dev.berkeleybop.org/search?offset=0&limit=50&taxon=http://purl.obolibrary.org/obo/NCBITaxon_10090

Considering your elegant solution, I could not see any other reason than a SPARQL query to redesign, but when looking at the response, I see a "sparql" field at the end that is messing up the response. Could it be why this query is slow ?

Sample from the response:

"sparql": "PREFIX owl: <http://www.w3.org/2002/07/owl#> \nPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n#model metadata\nPREFIX metago: <http://model.geneontology.org/>\nPREFIX lego: <http://geneontology.org/lego/> \n#model data\nPREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050>\nPREFIX occurs_in: <http://purl.obolibrary.org/obo/BFO_0000066>\nPREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>\nPREFIX has_input: <http://purl.obolibrary.org/obo/RO_0002233>\nPREFIX has_output: <http://purl.obolibrary.org/obo/RO_0002234>\nPREFIX causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002411>\nPREFIX provides_direct_input_for: <http://purl.obolibrary.org/obo/RO_0002413>\nPREFIX directly_positively_regulates: <http://purl.obolibrary.org/obo/RO_0002629>\n\nSELECT  ?id ?date ?title ?state  (GROUP_CONCAT(DISTINCT ?contributor;separator=\";\") AS ?contributors) (GROUP_CONCAT(DISTINCT ?group;separator=\";\") AS ?groups)    \nWHERE {\n  GRAPH ?id {  \n        ?id <http://purl.org/dc/elements/1.1/title> ?title ;\n           <http://purl.org/dc/elements/1.1/date> ?date ;\n           <http://purl.org/dc/elements/1.1/contributor> ?contributor ;   \n        optional{?id <http://purl.org/pav/providedBy> ?group } .   \n        optional{?id lego:modelstate ?state } .    \n       \n      \n      \n       \n      \n      \n      \n       VALUES ?id { \n<http://model.geneontology.org/SYNGO_1940> \n<http://model.geneontology.org/cec18c47-fdc7-49e2-b984-f18ff6e879f8> \n<http://model.geneontology.org/3050ee6a-25b5-4589-9fe1-403433c0a70b> \n<http://model.geneontology.org/SYNGO_1943> \n<http://model.geneontology.org/0cb2a12e-36d5-4be9-838b-f3c52938768b> \n<http://model.geneontology.org/3c017263-064d-4ea4-9982-4bf5ad754a81> \n<http://model.geneontology.org/313091b5-f5be-4be4-b814-4b2cc462be74> \n<http://model.geneontology.org/3c977124-f610-4db7-bfa4-e04f0d505cf9> \n<http://model.geneontology.org/78e3156d-3d80-4ba2-8556-76c3b186dc5a> \n<http://model.geneontology.org/13942cf0-359b-4ec9-9091-9e67c23a353b> \n<http://model.geneontology.org/b6043995-b203-494c-8d84-883669765dd9> \n<http://model.geneontology.org/ec3ba64b-34ee-4f61-bcc1-99cd0ce252cc> \n<http://model.geneontology.org/3abb0e36-6ba2-4548-a37f-6f105407874e> \n<http://model.geneontology.org/db3f468e-ab8d-41df-8049-2151b14af94b> \n<http://model.geneontology.org/8d539789-349d-4d5f-8be9-9b761b499ae0> \n<http://model.geneontology.org/160a7be8-43f1-4b6b-9edd-116bee206837> \n<http://model.geneontology.org/7759d242-bb8d-4f83-8406-67f7770f7d60> \n<http://model.geneontology.org/ce10473b-df09-4744-8774-17545a78c446> \n<http://model.geneontology.org/9712a11d-60f1-4b41-a04a-4131b95e2176> \n<http://model.geneontology.org/b36dee6e-4b7c-460c-8c6e-197e3a321fe0> \n<http://model.geneontology.org/SYNGO_1931> \n<http://model.geneontology.org/SYNGO_1930> \n<http://model.geneontology.org/6acc8709-5e45-4792-8112-a90b9cc76b2e> \n<http://model.geneontology.org/1cb9ff3f-b2a5-44e0-b38d-2fcf68488046> \n<http://model.geneontology.org/SYNGO_1932> \n<

…lag - work on #290 Surprisingly, this doesn't seem to impact response time for: #290 . It appears this may be related to #249 as I am intermittently seeing the same error for large responses: SEVERE: An I/O error has occurred while writing a response message entity to the container output stream. org.glassfish.jersey.server.internal.process.MappableException: com.google.gson.JsonIOException: org.eclipse.jetty.io.EofException at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:67) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) Notable that response time is very fast for the Master model collection ( a couple thousand). Starts lagging for the dev collection (10s of thousands).

goodb · 2020-03-28T20:33:31Z

@lpalbou surprisingly the sparql information doesn't seem to be the root of the problem. I made that an optional parameter anyway, to clear up the response. Now you need to add &debug to see it.

Investigating.. It seems to be triggering another problem in the server that I've come across elsewhere.

Apart from this, I'm not opposed to adding the taxon information into the model metadata. But, I think that should probably be done (if desired) as part of small overhaul of all the desired model metadata - e.g. created/modified date, shex=valid, etc.

goodb · 2020-03-28T20:34:25Z

@lpalbou notable that this is not an issue at all in the Master repo. Scaling up to the dev collection triggers it.

This includes a method to update all models in a given journal with taxon information as metadata on the models. It will need to be run on the input database for the taxa-related search features to work. Models saved using this build will include the taxon data. Notable performance improvements over last incarnation. Batch update all taxon metadata. minerva-cli.sh --add-taxon-metadata -j blazegraph-gocam-db.jnl -ontojournal /tmp/blazegraph.jnl

goodb · 2020-03-30T05:21:13Z

@lpalbou latest PR has things mostly set up as you wrote in the ticket. Couldn't get the first approach to work at the scale of the dev model collection. Adding taxon as an annotation on the models themselves (same level as title). This seems to work much faster. Will take some coordination to switch e.g. @dustine32 's generator and add the information to existing models but not hard.

lpalbou · 2020-04-01T17:03:26Z

That's indeed what I was guessing... although your map (model -> taxon) updated by minerva on every model request should have worked ? I wonder if it could be related to #291. Anyhow, thanks, adding the taxon to the model directly would work too. I agree we'll have to revisit the meta we want to include in the model.

goodb · 2020-04-02T17:11:06Z

It did work, just not well at scale. Results in a very large query in the second step. Yes, the same pattern could be related to slow down when searching by all ontology terms (via e.g. MF with expand) as again, it results in a very large query.

May be more efficient to try more direct integration of the ontology graph with the model graph. This would require more time to figure out as I suspect that a lot of things assume that the model graph has nothing else in it.

the dev server appears to be working with the new model now. e.g. http://noctua-dev.berkeleybop.org:6800/search/?taxon=10090&limit=100000.

goodb · 2020-04-18T21:24:35Z

Code here is working. Will need to update the production and other incoming models with the taxon field.

lpalbou added the high priority label Mar 18, 2020

lpalbou assigned goodb Mar 18, 2020

lpalbou changed the title ~~API - add search by species~~ API - add search models by species Mar 18, 2020

lpalbou changed the title ~~API - add search models by species~~ Task 3 - API - add search models by species Mar 18, 2020

goodb mentioned this issue Mar 22, 2020

Blazegraph for go lego experiment #297

Merged

goodb mentioned this issue Mar 29, 2020

minerva (current dev branch) intermittently not sending individuals #249

Closed

goodb mentioned this issue Apr 1, 2020

add taxon metadata for each model geneontology/gocamgen#80

Closed

goodb closed this as completed Apr 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task 3 - API - add search models by species #290

Task 3 - API - add search models by species #290

lpalbou commented Mar 18, 2020 •

edited by goodb

Loading

lpalbou commented Mar 18, 2020

goodb commented Mar 22, 2020

lpalbou commented Mar 26, 2020

goodb commented Mar 28, 2020

goodb commented Mar 28, 2020

goodb commented Mar 30, 2020

lpalbou commented Apr 1, 2020 •

edited

Loading

goodb commented Apr 2, 2020

goodb commented Apr 18, 2020

Task 3 - API - add search models by species #290

Task 3 - API - add search models by species #290

Comments

lpalbou commented Mar 18, 2020 • edited by goodb Loading

lpalbou commented Mar 18, 2020

goodb commented Mar 22, 2020

lpalbou commented Mar 26, 2020

goodb commented Mar 28, 2020

goodb commented Mar 28, 2020

goodb commented Mar 30, 2020

lpalbou commented Apr 1, 2020 • edited Loading

goodb commented Apr 2, 2020

goodb commented Apr 18, 2020

lpalbou commented Mar 18, 2020 •

edited by goodb

Loading

lpalbou commented Apr 1, 2020 •

edited

Loading