-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task 3 - API - add search models by species #290
Comments
@goodb what is the status of the new neo, we can discuss this requirement further |
@lpalbou the status for minerva.. is that I am about to put in PR into the dev branch that provides search by taxon id among several other things. I accomplished this by (1) building a taxon to models map on server startup (2) using that to handle the search (3) updating that map when models are saved. I didn't touch the model metadata but could of course do it that way as well. Still don't know what the NLP UI is or how it is intended to interact here. It seems my previous comment to that effect was lost in some kind of issue reshuffle. |
Update from recent discussions:
Considering your elegant solution, I could not see any other reason than a SPARQL query to redesign, but when looking at the response, I see a "sparql" field at the end that is messing up the response. Could it be why this query is slow ? Sample from the response:
|
…lag - work on #290 Surprisingly, this doesn't seem to impact response time for: #290 . It appears this may be related to #249 as I am intermittently seeing the same error for large responses: SEVERE: An I/O error has occurred while writing a response message entity to the container output stream. org.glassfish.jersey.server.internal.process.MappableException: com.google.gson.JsonIOException: org.eclipse.jetty.io.EofException at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:67) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) Notable that response time is very fast for the Master model collection ( a couple thousand). Starts lagging for the dev collection (10s of thousands).
@lpalbou surprisingly the sparql information doesn't seem to be the root of the problem. I made that an optional parameter anyway, to clear up the response. Now you need to add &debug to see it. Investigating.. It seems to be triggering another problem in the server that I've come across elsewhere. Apart from this, I'm not opposed to adding the taxon information into the model metadata. But, I think that should probably be done (if desired) as part of small overhaul of all the desired model metadata - e.g. created/modified date, shex=valid, etc. |
@lpalbou notable that this is not an issue at all in the Master repo. Scaling up to the dev collection triggers it. |
This includes a method to update all models in a given journal with taxon information as metadata on the models. It will need to be run on the input database for the taxa-related search features to work. Models saved using this build will include the taxon data. Notable performance improvements over last incarnation. Batch update all taxon metadata. minerva-cli.sh --add-taxon-metadata -j blazegraph-gocam-db.jnl -ontojournal /tmp/blazegraph.jnl
@lpalbou latest PR has things mostly set up as you wrote in the ticket. Couldn't get the first approach to work at the scale of the dev model collection. Adding taxon as an annotation on the models themselves (same level as title). This seems to work much faster. Will take some coordination to switch e.g. @dustine32 's generator and add the information to existing models but not hard. |
That's indeed what I was guessing... although your map (model -> taxon) updated by minerva on every model request should have worked ? I wonder if it could be related to #291. Anyhow, thanks, adding the taxon to the model directly would work too. I agree we'll have to revisit the meta we want to include in the model. |
It did work, just not well at scale. Results in a very large query in the second step. Yes, the same pattern could be related to slow down when searching by all ontology terms (via e.g. MF with expand) as again, it results in a very large query. May be more efficient to try more direct integration of the ontology graph with the model graph. This would require more time to figure out as I suspect that a lot of things assume that the model graph has nothing else in it. the dev server appears to be working with the new model now. e.g. http://noctua-dev.berkeleybop.org:6800/search/?taxon=10090&limit=100000. |
Code here is working. Will need to update the production and other incoming models with the taxon field. |
Task requirement from Noctua Landing Page Project
This will be a three steps task:
Also linked to #230
The text was updated successfully, but these errors were encountered: