-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to properly gather using an sbt as input? #1089
Comments
interesting - the first one should have worked, now that you mention it :). good catch.
|
Thinking about this a bit more, and also sig selection (#1072) Would it be useful to enable signature selection by name/identifier via an input file (e.g. pass an sbt via Use case: store all dataset sigs within an sbt (don't keep sigs). If you need to gather a subset of sigs against another database, you could select just the genomes/samples of interest, rather than gathering all sigs. Choosing a completely off the wall example... say we have all GTDB sigs calculated, can we pass in the full database sbts and select signatures from within? I suppose this would rely on either 1) a standard naming scheme that includes a genome accession/identifier, or 2) a file mapping accessions/ experimental identifiers to a sourmash signature identifier, e.g. md5sum |
you can already do this with --query-md5 for a single signature. You're talking about doing this for multigather, I think? Intriguing... One problem is that it is not currently that fast to load/select from 25k sigs in an SBT or an LCA, so this would not necessarily be that efficient. |
how does "not that fast" compare with 30mins-2hrs of snakemake DAG load times? And is that something that could be sped up, or likely to stay slow? I might be exaggerating a little (I haven't explicitly timed all my runs), but you get the idea. I suppose I could get around this by keeping the signature files and using |
:rolls eyes a little bit: fine
|
sidenote: I now have an implementation for using the sbt via |
see #1090 |
some other notes looking at this issue --
|
Using the test data in
tests/test-data/prot
:sourmash multigather --query protein.sbt.zip --db protein.sbt.zip --threshold-bp=0 --protein
Or, trying
--query-from-file
:sourmash multigather --query-from-file protein.sbt.zip --db protein.sbt.zip --threshold-bp=0
The text was updated successfully, but these errors were encountered: