-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: add abund estimation to manysearch
#302
Conversation
Comparing this PR's abund Small test dataset:
built and run by @AnneliektH via: https://github.com/AnneliektH/2024-binning-kmer/blob/main/workflow/notebooks/notes_sourmash.ipynb (both run with single thread)
|
wow, that's a fantastic speed improvement 😆 |
…water into manysearch-add-abund
per @bluegenes -
I will dig in! |
manysearch
manysearch
@ctb We also had a couple other thoughts going through this. I think they were:
|
Yes! If you get to it before I review and merge, great. If not, I'll create an issue 😉
Sure! Sensitive default => better here. |
ran https://hackmd.io/tKpLr1ISR9mqHHmZbEvcow?view with both executed:
and then loaded both CSVs in and explored. The numbers look the same, so 🎉 !! A few interesting differences -
I want to take a gander at the ANI values, but if those are fine (which I expect they will be :glare:) then I will merge, unless there is a last minute objection. |
🎉 they are identical. merging! |
Adds abundance columns
average_abund
,median_abund
,std_abund
tomanysearch
output.Changed behavior:
manysearch
to only build & write a result if we find an overlap between the query and search sigs. This was to avoid issues with abundance-related columns (average, median, std deviation calculations), but I think generally this:To Do:
Unfortunately, I can only estimate abundance info from non-rocksdb
manysearch
. Abund estimation for rocksdb databases would require us to estimate these values w/in sourmash, since we don't have access to which hashes match.Note: for
mgmanysearch
type applications, the default threshold, 0.01` is far too stringent. Recommend setting threshold to 0 for these applications.