Optimize bitmaps for more memory-efficiency #267

Taepper · 2024-01-30T11:42:08Z

The very memory efficient mode is now implemented:
The most numerous bitmap is deleted instead of flipped. When looking up values, all other bitmaps are considered instead.

A good next step would be a hybrid mode, where the deletion is only preferred over flipping, if there is substantial memory gains, as the performance impact is quite high

fengelniederhammer

I just have a couple of question, but otherwise seems fine.

Is it really worth the trouble though? The reduction of the size numbers in the tests doesn't seem overwhelming...

include/silo/common/aa_symbols.h

src/silo/storage/database_partition.cpp

src/silo/storage/position.cpp

src/silo/storage/position.test.cpp

src/silo/query_engine/filter_expressions/or.cpp

src/silo/query_engine/actions/mutations.cpp

src/silo/query_engine/actions/fasta_aligned.cpp

Taepper · 2024-01-30T18:05:10Z

I see your point that the effect seems small for the test data set if one looks only at storage consumption. There the static overhead mostly hides the effect because of the few sequences. Although, looking at the values_store_in_X_container shows the rather big savings already. It will be much more apparent for bigger datasets, where the effect will be closer to 90%.

…st numerous symbol is deleted

fengelniederhammer

LGTM

fengelniederhammer · 2024-01-31T09:01:22Z

src/silo/query_engine/actions/mutations.cpp

+         const auto deleted_symbol = current_position.getDeletedSymbol();
+         if (deleted_symbol.has_value() && symbol != *deleted_symbol) {


Is this different from !current_position.isSymbolDeleted(symbol)?

fengelniederhammer · 2024-01-31T09:08:13Z

src/silo/storage/position.cpp

+      bitmap.runOptimize();
+      bitmap.shrinkToFit();


I don't know whether it's good to hide this side effect in a getSomething method.

fengelniederhammer · 2024-01-31T09:51:37Z

src/silo/query_engine/actions/mutations.cpp

+      for (const uint32_t idx : *filter) {
+         const roaring::Roaring& n_bitmap = sequence_store_partition.missing_symbol_bitmaps[idx];
+         if (n_bitmap.contains(position)) {
+            count_of_mutations_per_position[*deleted_symbol][position] -= 1;
+         }
+      }


As discussed, let's inline this into addPositionToMutationCountsForMixedBitmaps and get rid of the separate correct... method.

…here Position invariant was broken because of 'missing' symbols

…table

Taepper mentioned this pull request Jan 30, 2024

Make memory-efficient mode configurable #268

Open

fengelniederhammer reviewed Jan 30, 2024

View reviewed changes

feat: introduce new storage type for Sequence Positions, where the mo…

6e15204

…st numerous symbol is deleted

Taepper force-pushed the optimize-bitmaps branch from 851e241 to 1c8e3b9 Compare January 30, 2024 20:25

fengelniederhammer approved these changes Jan 31, 2024

View reviewed changes

Taepper added 4 commits January 31, 2024 11:11

fix: also consider 'missing' symbols in the mutation action. Bugfix w…

fab72a6

…here Position invariant was broken because of 'missing' symbols

fix: change test to reflect new optimisations

cb63010

feat: optimize bitmaps before finishing partition

5b06d58

feat: load table lazily. Unaligned Sequences do not need to load the …

c2a8439

…table

Taepper force-pushed the optimize-bitmaps branch from 048f5d1 to c2a8439 Compare January 31, 2024 10:11

Taepper merged commit a340b2c into main Jan 31, 2024
2 of 6 checks passed

Taepper deleted the optimize-bitmaps branch January 31, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize bitmaps for more memory-efficiency #267

Optimize bitmaps for more memory-efficiency #267

Taepper commented Jan 30, 2024

fengelniederhammer left a comment

Taepper commented Jan 30, 2024

fengelniederhammer left a comment

fengelniederhammer Jan 31, 2024

fengelniederhammer Jan 31, 2024

fengelniederhammer Jan 31, 2024

		const auto deleted_symbol = current_position.getDeletedSymbol();
		if (deleted_symbol.has_value() && symbol != *deleted_symbol) {

Optimize bitmaps for more memory-efficiency #267

Optimize bitmaps for more memory-efficiency #267

Conversation

Taepper commented Jan 30, 2024

fengelniederhammer left a comment

Choose a reason for hiding this comment

Taepper commented Jan 30, 2024

fengelniederhammer left a comment

Choose a reason for hiding this comment

fengelniederhammer Jan 31, 2024

Choose a reason for hiding this comment

fengelniederhammer Jan 31, 2024

Choose a reason for hiding this comment

fengelniederhammer Jan 31, 2024

Choose a reason for hiding this comment