Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Remove useless grenad merging #456

Merged
merged 2 commits into from
Mar 1, 2022
Merged

Remove useless grenad merging #456

merged 2 commits into from
Mar 1, 2022

Conversation

Kerollmops
Copy link
Member

@Kerollmops Kerollmops commented Feb 28, 2022

This PR must be merged after #454.

This PR removes the part of code that was merging all of the grenad Readers merging that we don't need as the indexer should have merged them and, therefore, we should only have one final grenad Reader. We reduce the amount of CPU usage and memory pressure we were doing uselessly.

@ManyTheFish are you sure I can skip merging the word_docids database?

Here is the benchmark comparison with the previously merged PR #454:

group                                              indexing_reintroduce-appending-sorted-values_c05e42a8    indexing_remove-useless-grenad-merging_d5b8b5a2
-----                                              -----------------------------------------------------    -----------------------------------------------
indexing/Indexing movies with default settings     1.06      16.6±1.04s        ? ?/sec                      1.00      15.7±0.93s        ? ?/sec
indexing/Indexing songs with default settings      1.16      60.1±7.07s        ? ?/sec                      1.00      51.7±5.98s        ? ?/sec
indexing/Indexing songs without faceted numbers    1.06      55.4±6.14s        ? ?/sec                      1.00      52.2±4.13s        ? ?/sec

And the comparison with multi-batch indexing before #436, we can see that we gain time for benchmarks that index datasets in multiple batches but there is so much variance that it's not clear.

group                                                             indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620    indexing_remove-useless-grenad-merging_d5b8b5a2
-----                                                             ----------------------------------------------------------------    -----------------------------------------------
indexing/Indexing geo_point                                       1.07       6.6±0.08s        ? ?/sec                                 1.00       6.2±0.11s        ? ?/sec
indexing/Indexing songs in three batches with default settings    1.12      57.7±2.14s        ? ?/sec                                 1.00      51.5±3.80s        ? ?/sec
indexing/Indexing songs with default settings                     1.00      47.5±2.52s        ? ?/sec                                 1.09      51.7±5.98s        ? ?/sec
indexing/Indexing songs without any facets                        1.00      43.5±1.43s        ? ?/sec                                 1.12      48.8±3.73s        ? ?/sec
indexing/Indexing songs without faceted numbers                   1.00      47.1±2.23s        ? ?/sec                                 1.11      52.2±4.13s        ? ?/sec
indexing/Indexing wiki                                            1.00    917.3±30.01s        ? ?/sec                                 1.09    998.7±38.92s        ? ?/sec
indexing/Indexing wiki in three batches                           1.09   1091.2±32.73s        ? ?/sec                                 1.00    996.5±15.70s        ? ?/sec

What do you think @irevoire? Should we change the benchmarks to make them do more runs?

@Kerollmops Kerollmops marked this pull request as ready for review February 28, 2022 11:36
@ManyTheFish
Copy link
Member

ManyTheFish commented Feb 28, 2022

@Kerollmops

@ManyTheFish are you sure I can skip merging the word_docids database?

yes, you can! Below the code that merges this database:
https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/extract/mod.rs#L89-L97

ManyTheFish
ManyTheFish previously approved these changes Feb 28, 2022
@Kerollmops
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Mar 1, 2022

@bors bors bot merged commit 51cf44d into main Mar 1, 2022
@bors bors bot deleted the remove-useless-grenad-merging branch March 1, 2022 16:57
@irevoire irevoire added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Jul 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants