MRG: fix performance regression in `manysearch` by removing unnecessary downsampling #464

ctb · 2024-10-02T13:07:13Z

Tackles #463

This PR adjusts manysearch so that downsampling is only done on sketches when actually needed. Prior to this, the downsample_max_hash/downsample_scaled code was running on sketches even when there was no need.

This PR also adds --ignore-abundance to manysearch which optionally turns off potentially expensive abundance estimation (which in practice is not that expensive, apparently; see #463).

Fixes #466

ctb · 2024-10-06T01:07:27Z

@bluegenes ready for review & merge!

ctb · 2024-10-07T23:15:57Z

bump :)

bluegenes · 2024-10-08T18:08:56Z

src/manysearch.rs

+                            // avoid calculating details unless there is overlap
+                            let overlap = query
+                                .minhash
+                                .count_common(against_mh, true)


ah, count_common handles downsampling only if needed. Didn't realized downsample_scaled always downsampled, even if not needed!

bluegenes

lgtm

ctb added 2 commits October 2, 2024 06:05

add support for ignoring abundance

b510e8e

cargo fmt

0993b39

ctb mentioned this pull request Oct 2, 2024

10-fold manysearch performance slowdown between v0.9.5 and v0.9.6 #463

Open

ctb added 8 commits October 4, 2024 06:22

avoid downsampling until we know there is overlap

ac82fb3

change downsample to true; add panic assertion

7ea9a40

move downsampling side guard

03b9da0

eliminate redundant overlap check

b954daa

move calc_abund_stats

b0bcc66

extract abundance code into own function; avoid downsampling if poss

a2871c0

cleanup

d853ef3

fmt

453f943

ctb changed the title ~~WIP: add support for ignoring abundance in manysearch~~ WIP: fix performance regression in manysearch Oct 6, 2024

ctb changed the title ~~WIP: fix performance regression in manysearch~~ MRG: fix performance regression in manysearch by removing unnecessary downsampling Oct 6, 2024

This was referenced Oct 6, 2024

downsample_* functions in minhash.rs _always_ downsample, even when downsampling is not necessary sourmash-bio/sourmash#3343

Closed

MRG: add generic support for any type of sketch collection as query or database #430

Merged

bluegenes reviewed Oct 8, 2024

View reviewed changes

bluegenes approved these changes Oct 8, 2024

View reviewed changes

ctb merged commit 88e406f into main Oct 8, 2024
1 check passed

ctb deleted the toggle_manysearch_abund branch October 8, 2024 18:30

This was referenced Oct 13, 2024

MRG: avoid clones by using new Signature::try_into() -> KmerMinHash #471

Merged

MRG: update to v0.9.8 #475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG: fix performance regression in `manysearch` by removing unnecessary downsampling #464

MRG: fix performance regression in `manysearch` by removing unnecessary downsampling #464

ctb commented Oct 2, 2024 •

edited

Loading

ctb commented Oct 6, 2024

ctb commented Oct 7, 2024

bluegenes Oct 8, 2024

ctb Oct 8, 2024

bluegenes left a comment

MRG: fix performance regression in manysearch by removing unnecessary downsampling #464

MRG: fix performance regression in manysearch by removing unnecessary downsampling #464

Conversation

ctb commented Oct 2, 2024 • edited Loading

ctb commented Oct 6, 2024

ctb commented Oct 7, 2024

bluegenes Oct 8, 2024

Choose a reason for hiding this comment

ctb Oct 8, 2024

Choose a reason for hiding this comment

bluegenes left a comment

Choose a reason for hiding this comment

MRG: fix performance regression in `manysearch` by removing unnecessary downsampling #464

MRG: fix performance regression in `manysearch` by removing unnecessary downsampling #464

ctb commented Oct 2, 2024 •

edited

Loading