Optimize Prefixes and Merges #5124

Kerollmops · 2024-12-04T15:18:52Z

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

Optimize the prefix generation for word position docids (@ManyTheFish)
Optimize the parallel merging of the caches to sort entries before merging the caches (@Kerollmops)

Benchmarks on 1cpu 2gb gpo3 (5k IOps)

Before on the tag meilisearch-v1.12.0-rc.3.

word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s

After sorting the whole HashMaps in a Vec on this branch.

word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s

Kerollmops · 2024-12-04T15:38:31Z

/bench workloads/hackernews-add-new-documents.json workloads/hackernews-modify-*

Kerollmops · 2024-12-04T16:22:02Z

/bench workloads/movies.json workloads/hackernews.json

meili-bot · 2024-12-04T16:31:37Z

☀️ Benchmark invocation completed, please find the results for your workloads below:

Kerollmops · 2024-12-04T17:15:05Z

bors merge

Kerollmops · 2024-12-04T17:17:49Z

bors merge

meili-bors · 2024-12-04T17:17:52Z

Already running a review

Kerollmops · 2024-12-04T17:18:49Z

I want to change the name of the merge_alt function and perform more benchmarks on larger machines.

Kerollmops · 2024-12-04T17:19:13Z

bors cancel

meili-bors · 2024-12-04T17:19:17Z

Canceled.

ManyTheFish

bors merge

Kerollmops · 2024-12-05T09:27:41Z

bors merge

meili-bors · 2024-12-05T09:27:45Z

Already running a review

meili-bors · 2024-12-05T10:11:19Z

Build succeeded:

5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach. The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB. This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory. Co-authored-by: Kerollmops <clement@meilisearch.com>

Replace HashSets by BTreeSets for the prefixes

739c52a

Kerollmops requested a review from ManyTheFish December 4, 2024 15:18

Kerollmops added 2 commits December 4, 2024 16:33

Introduce a new semi ordered merge function

29ef164

Use the merge_caches_alt function in the docids merging

be41143

ManyTheFish previously approved these changes Dec 4, 2024

View reviewed changes

Consume vec instead of draining

cb99ac6

Kerollmops dismissed ManyTheFish’s stale review via cb99ac6 December 4, 2024 16:00

Lexicographically sort all the map to merge

2e32d04

ManyTheFish previously approved these changes Dec 4, 2024

View reviewed changes

Kerollmops added this to the v1.12.0 milestone Dec 4, 2024

Kerollmops marked this pull request as ready for review December 4, 2024 17:17

Kerollmops marked this pull request as draft December 4, 2024 17:19

Clean up and remove the non-sorted merge_caches function

5284312

Kerollmops dismissed ManyTheFish’s stale review via 5284312 December 5, 2024 09:03

ManyTheFish approved these changes Dec 5, 2024

View reviewed changes

Kerollmops mentioned this pull request Dec 5, 2024

Change the default max memory usage to 5% of the total memory #5125

Merged

Kerollmops marked this pull request as ready for review December 5, 2024 09:27

meili-bors bot merged commit cac355b into release-v1.12.0 Dec 5, 2024
10 checks passed

meili-bors bot deleted the optimize-prefixes-and-merges branch December 5, 2024 10:11

meili-bot added the v1.12.0 PRs/issues solved in v1.12.0 released on 2024-12-23 label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Prefixes and Merges #5124

Optimize Prefixes and Merges #5124

Kerollmops commented Dec 4, 2024 •

edited

Loading

Kerollmops commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

meili-bot commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

meili-bors bot commented Dec 4, 2024

Kerollmops commented Dec 4, 2024 •

edited

Loading

Kerollmops commented Dec 4, 2024

meili-bors bot commented Dec 4, 2024

ManyTheFish left a comment

Kerollmops commented Dec 5, 2024

meili-bors bot commented Dec 5, 2024

meili-bors bot commented Dec 5, 2024

Optimize Prefixes and Merges #5124

Optimize Prefixes and Merges #5124

Conversation

Kerollmops commented Dec 4, 2024 • edited Loading

Benchmarks on 1cpu 2gb gpo3 (5k IOps)

Kerollmops commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

meili-bot commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

Kerollmops commented Dec 4, 2024

meili-bors bot commented Dec 4, 2024

Kerollmops commented Dec 4, 2024 • edited Loading

Kerollmops commented Dec 4, 2024

meili-bors bot commented Dec 4, 2024

ManyTheFish left a comment

Choose a reason for hiding this comment

Kerollmops commented Dec 5, 2024

meili-bors bot commented Dec 5, 2024

meili-bors bot commented Dec 5, 2024

Kerollmops commented Dec 4, 2024 •

edited

Loading

Kerollmops commented Dec 4, 2024 •

edited

Loading