-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize bitmaps for more memory-efficiency #267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a couple of question, but otherwise seems fine.
Is it really worth the trouble though? The reduction of the size numbers in the tests doesn't seem overwhelming...
I see your point that the effect seems small for the test data set if one looks only at storage consumption. There the static overhead mostly hides the effect because of the few sequences. Although, looking at the |
…st numerous symbol is deleted
851e241
to
1c8e3b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
const auto deleted_symbol = current_position.getDeletedSymbol(); | ||
if (deleted_symbol.has_value() && symbol != *deleted_symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from !current_position.isSymbolDeleted(symbol)
?
bitmap.runOptimize(); | ||
bitmap.shrinkToFit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it's good to hide this side effect in a getSomething
method.
for (const uint32_t idx : *filter) { | ||
const roaring::Roaring& n_bitmap = sequence_store_partition.missing_symbol_bitmaps[idx]; | ||
if (n_bitmap.contains(position)) { | ||
count_of_mutations_per_position[*deleted_symbol][position] -= 1; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's inline this into addPositionToMutationCountsForMixedBitmaps
and get rid of the separate correct...
method.
…here Position invariant was broken because of 'missing' symbols
048f5d1
to
c2a8439
Compare
The very memory efficient mode is now implemented:
The most numerous bitmap is deleted instead of flipped. When looking up values, all other bitmaps are considered instead.
A good next step would be a hybrid mode, where the deletion is only preferred over flipping, if there is substantial memory gains, as the performance impact is quite high