-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: avoid clones by using new Signature::try_into() -> KmerMinHash
#471
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…water into ctb_misc_cleanup
…water into ctb_misc2
…water into ctb_misc2
…water into ctb_misc_cleanup
#434) * preliminary victory * compiles and mostly runs * cleanup, split to new module * cleanup and comment * more cleanup of diff * cargo fmt * fix fmt * restore n_failed * comment failing test * cleanup and de-vec * create module/submodule structure * comment for later * get rid of vec * beg for help * cleanup and doc
…water into ctb_misc2
ctb
changed the title
WIP: avoid clones by using new
MRG: avoid clones by using new Oct 13, 2024
Signature::try_into() -> KmerMinHash
Signature::try_into() -> KmerMinHash
This was referenced Oct 13, 2024
luizirber
approved these changes
Oct 15, 2024
ctb
added a commit
to sourmash-bio/sourmash
that referenced
this pull request
Oct 15, 2024
…ing (#3352) This PR builds on the refactoring in #3342 to do less downsampling and also avoids doing intersections twice (per #3196). Benchmarks in sourmash-bio/sourmash_plugin_branchwater#471 are pretty astonishing... Fixes #3196 --------- Co-authored-by: Luiz Irber <luizirber@users.noreply.github.com>
…water into avoid_clones
This was referenced Oct 15, 2024
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: PR into #430. Includes #467.
gather comparison
Comparing latest MultiCollection benchmarks from #430 to this PR, for SRR1976948, we see:
manysearch comparison
Comparing to
manysearch
benchmarking results in #463, we see:* this run used sig.zip files and manifest CSVs in the benchmarking.
So this is a major improvement in both time and memory over, really, everything since v0.8.6!
The last line shows that there is now a major benefit to using lists of .sig.zip files and/or manifest CSVs that point at zip files: basically, only the relevant sketch (k=21/31/51 or whatever) is being loaded, and there's no double-loading to generate the manifest (as introduced in 0.9.0). This was supposed to be one of the major benefits of #430 so I'm very happy about these results showing that's the case!! The extra memory usage is presumably because so much more of the time is being spent in calculations vs loading.