-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefetch-only Index
classes and/or remote servers?
#2229
Comments
incidentally, one way to jerry-rig this directly into our current API setup is to have the prefetch server return precisely two pieces of information: the number of shared hashes, and the md5sum (or other unique key). This can then be used as picklists for more detailed analyses of signatures. this is quite different from the greyhound idea which is to use massive parallelism to search non-overlapping subsets of databases. |
It is the same dilemma from the LCA index: the signatures are not explicitly present, but you can recompute them. SBTs have both index and sigs explicitly in the distribution. In
👍
Technically this is what
One thing that #1943 is doing is splitting |
an update, nearly two years later:
So in this sense they are in fact excellent examples of prefetch-only |
@luizirber and I chatted a bit over slack about the new mastiff service he built, which allows ~realtime search of the SRA public metagenomes (!!)
This, in turn, enables other things like realtime JavaScript dashboards for genome inclusion in metagenomes, etc. So that's cool.
One of the things that stuck with me is that there is an increasingly useful distinction between "prefetch" on databases and then further triage and reporting. Here, prefetch is our internal term for "give me all the overlaps that exist for this query", and it can be turned into containment searches or Jaccard similarity searches or other things easily; see #1392 for some background here.
For mastiff, luiz has, I think, primarily sped up this prefetch functionality. Actual
Index
-like functionality that requires access to the signatures is completely distinct, and would be hard to implement on top of mastiff directly, without providing access to the signatures.So maybe there is a useful distinction here for future API development:
Storage
-like, now that I think of it?)Index
class that combines the two to provide fullIndex
class services that enable all the good things.ref RPC more generally, #1644
The text was updated successfully, but these errors were encountered: