-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] subtantial refactoring of CounterGather
and related Index
code.
#2116
Conversation
…o refactor/counter_gather_tests
…o refactor/counter_gather_tests
Codecov Report
@@ Coverage Diff @@
## latest #2116 +/- ##
==========================================
+ Coverage 84.30% 91.69% +7.38%
==========================================
Files 130 99 -31
Lines 15278 11002 -4276
Branches 2171 2166 -5
==========================================
- Hits 12880 10088 -2792
+ Misses 2095 611 -1484
Partials 303 303
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
…urmash-bio/sourmash into refactor/counter_gather_tests
CounterGather
CounterGather
and related Index
code.
@mr-eyes ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments/questions -- otherwise I think this looks good to me. cleaner and better descriptions, for sure
Note: PR into #2111
This PR does a bunch of cleanup and refactoring of gather related things:
search.make_gather_query
tosearch.make_containment_query
best_only
inIndex.prefetch
Index.gather
toIndex.best_containment
, and changes method to returnNone
or anIndexSearchResult
, rather than a list => much confusion reduction.search.py
into a single function 🎉flatten_and_*
utility functions insourmash.minhash
.CounterGather
classes to takeSourmashSignature
instead ofMinHash
object, in line with Better conceptual definition on minhash/signature/query #616CounterGather.siglist
andlocations
to use dictionaries instead of listsCounterGather.signatures()
CounterGather_LCA()
test implementationThe net result is a lot of code reduction in the Python codebase plus (IMO) more readable code.
A note to
CounterGather
aficionadosThe change to using dictionaries instead of lists to store signatures and locations in
CounterGather
means that only one signature per md5sum will be stored. For the defaultCounterGather
test, this will be the last one added.See test
test_index.py:test_counter_gather_identical_md5sum
for the specific implementation test forCounterGather
, andtest_index_protocol.py::test_counter_gather_multiple_identical_matches
for the general protocol test.TODO
CounterGather
technique to provide list of locations, list of signatures??CounterGather
classes take a list of index databases to prefetch on, instead of current creation API?Index.gather
?