Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Fix hard-deletion of an external id that was soft-deleted and then reimported - main #750

Merged
merged 1 commit into from
Dec 20, 2022

Conversation

loiclec
Copy link
Contributor

@loiclec loiclec commented Dec 19, 2022

Pull Request

Related issue

Fixes (when merged into meilisearch) meilisearch/meilisearch#3021

What does this PR do?

There was a bug happening when:

  1. Documents were added
  2. Some of these documents were replaced using soft-deletion
  3. A deletion of another non-replaced document takes place and triggers a hard-deletion
  4. Documents with the same identifiers as the replaced documents are added again

Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately).

The cause of the bug is:

  1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id.
  2. Then, we take this list of external document ids and remove each of them from the ExternalDocumentsIds structure.
  3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document.

Implementation of the fix

  1. Before we process a permanent deletion, we update the list of soft-deleted document ids.
  2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update ExternalDocumentsIds, we can simply call the delete_soft_deleted_documents_ids_from_fsts method, which is faster and simpler.

Correctness

A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem.

@loiclec loiclec added bug Something isn't working no breaking The related changes are not breaking (DB nor API) labels Dec 19, 2022
@loiclec loiclec requested a review from irevoire December 19, 2022 15:45
@loiclec loiclec marked this pull request as draft December 19, 2022 15:47
irevoire
irevoire previously approved these changes Dec 19, 2022
Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go 👍

But I believe you'll need to rewrite some of your tests since I merged this PR: #747

That update the way the index.index_documents_config works sorry 😬

milli/src/external_documents_ids.rs Outdated Show resolved Hide resolved
milli/src/external_documents_ids.rs Outdated Show resolved Hide resolved
@loiclec loiclec changed the title Fix hard-deletion of an external id that was soft-deleted Fix hard-deletion of an external id that was soft-deleted and then reinserted Dec 20, 2022
@loiclec loiclec changed the title Fix hard-deletion of an external id that was soft-deleted and then reinserted Fix hard-deletion of an external id that was soft-deleted and then reimported Dec 20, 2022
@loiclec loiclec marked this pull request as ready for review December 20, 2022 14:21
@loiclec loiclec changed the title Fix hard-deletion of an external id that was soft-deleted and then reimported Fix hard-deletion of an external id that was soft-deleted and then reimported -main Dec 20, 2022
@loiclec loiclec changed the title Fix hard-deletion of an external id that was soft-deleted and then reimported -main Fix hard-deletion of an external id that was soft-deleted and then reimported - main Dec 20, 2022
@loiclec loiclec requested a review from irevoire December 20, 2022 14:42
Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot; that's way better than the first implementation we thought of initially!

bors merge

@bors
Copy link
Contributor

bors bot commented Dec 20, 2022

@bors bors bot merged commit 249e051 into main Dec 20, 2022
@bors bors bot deleted the fix-bug-3021-again branch December 20, 2022 16:32
bors bot added a commit that referenced this pull request Dec 20, 2022
754: Fix hard-deletion of an external id that was soft-deleted and then reimported - release-v0.37.5 r=irevoire a=loiclec

This is a PR that cherry-picks the commit from #750 onto `release-v0.37.5`

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no breaking The related changes are not breaking (DB nor API)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants