-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk access to documents using bulk input #69
Comments
Similarly, there has been a user that expressed interest in the possibility of creating term ID lists: http://jira.geneontology.org/browse/GO-360 |
I think this would be better done with the batching on the server side--more robust and we wouldn't have to deal with sync, as well as giving easier access in a more traditional way (bulk remote clients and bots). I'll have to check if the perl API is up for it. I would also have to consider if the linking library was sufficient in the perl API. |
…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69
…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69
Looks like at least part of the functionality is going to be frustrated by berkeleybop/bbop-js#17. |
Creating something that makes sense in the context of AmiGO 2 is taking rather more work than expected, mostly due to a digression with berkeleybop/bbop-js#17, which itself is more work than expected. Thinking about bumping this to its own milestone and getting what's in there out as fast as possible. |
[Will edit this list in place.]
|
After some discussion with @cmungall and @hdietze, will explore (again?) if batching is really necessary. And the perl client? To get this out the door faster, it may be that we should just switch to POST and set the bar fairly low, putting all of the responsibility on the JS client to compose a large query and then setting the limits to something reasonable through experimentation. |
Well, looking at the issues and requests above again, I think I can explain the rationale better this time around. One set of issues that users are having seem to be along the lines of: I want to put in a bunch of term ids and get a listing of those terms I can work with (link, download, etc.). This is essentially the way the current bulk interface prototype is heading, and would probably be workable for non-huge numbers by composing large queries (will require some new stuff in the manager, but probably not too bad). That said, this is not the actual issue open here. The other set of issues (this issue) is along the lines of: I want to put in random gps and get annotation data; I want to put in terms and get gp (annotation) data. This would require the Solr equivalent of an RDB join, and I think can only practically be done with an initial query to get the "key ids" from one doctype and then using another query to get the wanted data from the target doctype. In large or complicated cases, maybe not something that one would want to handle in a single pass from the client (time), and breaking it up (a la the matrix tool) is rather unwieldy in practice. Reading this thread though, it seems my final deciding factor for wanting to do it on the server was to make it easier to directly create links and kick-ins for the service, so that these bulk pages could be treated in much the same way that term and gp pages are currently treated. Shelving those reasons for the moment though, if one was willing to sacrifice easy kick-ins and the single-step satisfaction of going straight from gp symbols to terms, a possible workable interface would be:
One thing would be lost in an interface like this, at least in the beginning, would be filtering on the second step (e.g. with these term ids, give me all annotations with this evidence); but you can image either extending this or feeding it into itself (we'd need kick in to get things like TE results links to work) so there somebody could take multiple steps through the different document types, filtering and joining with the next one. Sort of a shopping cart of the moment. If this makes sense, I think this might be a way to go for now: we'd get some parts running immediately, and could grow it out into other needed functionality (at the cost of single steps). It would also mean that the perl bits could be ignored for a while longer (possibly allowing us to stall long enough to get rid of them completely). If it sounds right, I'll add another issue for basic bulk search, and let this be the second step. |
Re: joins - the golr documents are denormalized and should not require joins for non-boutique queries. E.g. fetching annotations or entities by term ids would be achieved via the closure field. It may be the case that performance would be poor, but no join required. |
Talked again; the current plan is to 1) get the bulk download working and 2) get more gp information into the association doctype (which should meet most of the needs for most of the users without needing complicated "join" code). Will change title to reflect this. |
Addresses some parts of #69 Removed dead link to wiki (in any case all help should be inline). Changed text to remove reference to personalities. Still need to better explain how this works
Some comments on current status of bulk search (not sure if this is redundant with some of the extensive info above). GO ID use case
Gene use case
GenercityIt's great that this is driven by the schema metadata... from a CS perspective. But I think the experience will be alienating for a user here. Perhaps a more fruitful long term approach would be grebe with the ability to enter lists? Breaking resolution into a separate stepThere is no way for the user to see which subset of entered IDs resolve. Really resolution and bulk query are separate concerns. There are many cases where we might want to plug in the resolution part (e.g. TE) and many places where we want to include the bulk query part. Text annotation can be seen as a special case of resolution. For example, try the default query here: You should end up with a box that says "35 terms found" and a button to search with these 35. Another scenario is where the user enters IDs one at a time, e.g. Both text annotation (for terms or genes) and autocomplete based list building are equally useful for GO |
(The plain ID is just "bioentity"--it's being driven off of the metadata for other displays.) There is a bit of question of scope here. There is the unarguably power tool aspect here--this obviously needs inside knowledge--but it does possibly fill a niche as is. It was relatively easy to create given that it is bootstrapped from the metadata, but it is obviously not particularly useful to most users (hence it never graduating from labs). I think it would be good to determine if the current tool, or something similar, has any prospects (it was made for an initial narrow use case) and then try and work out from there. If a grebe-ified version would be more useful to users, we should probably just branch it off and try again now that the troubling parts work better. Once label and ID resolution is brought in, I think we should scrape and try for a new tool--there is just so much packed into that there is unlikely much that could be salvaged from here. The cart chugging along again would be so nice... |
Still interest here: http://jira.geneontology.org/browse/GO-1224 |
There could be some simplification added (e.g. GAF download on the annotation bulk download), and certainly better explanatory text for the different fields, but fundamentally the reason we essentially get these bulk widgets and pages for "free" is that they run right along the software patterns that are used underneath. |
There is still interest in this feature: |
talking to @hattrill , for the annotation search we should defualt to checked boxes for going from gene names/ids to terms. |
Note that these exist in a hidden "live" form: |
Nice ! Can we add links from the 'Search' menu? |
Although you might want to remove 'This should not be displayed (bioentity_internal_id)' ... |
@pgaudet This feature is still in development--this is still the same as shown at the meeting. |
did not work with input |
Hm. Looking at just "SPBC359.05", I couldn't find it in the "normal" AmiGO either. Is it possible we currently don't have those identifiers? Could you give me an ID for "SPBC359.05"? |
yes, thinking about it I didn't see yeast either when I was looking for C. elegans, so it was probably the same issue... |
[edited by @kltm ] Testing: Gene/product/bioentity: http://amigo.geneontology.org/amigo/bulk_search/bioentity
|
@ukemi to make it easier, I've numbered your list for reference.
Okay, you've found a lot of interesting quirks that may or may not be showstoppers depending in the use case we're attacking. |
The bulk download of annotations with gene product input is an occasionally used feature of AmiGO 1.x that I suspect might be missed.
The easiest implementation would be a series of queries to the GOlr server, with the results hashed on the perl side as we go to get rid of dupes.
The text was updated successfully, but these errors were encountered: