Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix race between manifest error recovery and file ingestion (#12871)
Summary: This PR fixes an assertion failure in `DBImpl::ResumeImpl` - `assert(!versions_->descriptor_log_)`. In `VersionSet`, `descriptor_log_` has a pointer to the current MANIFEST writer. When there's an error updating the manifest, `descriptor_log_` is reset, and the error recovery thread checks `io_status()` in `VersionSet` and attempts to write a new MANIFEST. If another DB manipulation happens at the same time (like external file ingestion, column family manipulation etc), it calls `LogAndApply`, which also attempts to write a new MANIFEST. The assertion in `ResumeImpl` might fail in this case since the other MANIFEST writer may have updated `descriptor_log_`. To prevent the assertion, this fix updates both `io_status_` and `descriptor_log_` while holding the DB mutex. The other option would have been to simply remove the assert. But I think its important to have it to ensure the invariant that `io_status_` is cleared if the MANIFEST is written successfully, and this fix makes things easier to reason about. Pull Request resolved: #12871 Test Plan: Existing tests and crash test Reviewed By: hx235 Differential Revision: D59926947 Pulled By: anand1976 fbshipit-source-id: af9ad18da3e29fc62c7ec2e30e0738aa33d4e5f1
- Loading branch information