-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: merges can deadlock #27442
Labels
A-kv-distribution
Relating to rebalancing and leasing.
Comments
benesch
added a commit
to benesch/cockroach
that referenced
this issue
Jul 12, 2018
The chaos added in 2ed1515 to force lease renewals while a merge is in progress is too disruptive and makes the test occasionally time out. See issue cockroachdb#27442 for an analysis. Skip the test for now. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jul 12, 2018
27441: storage: introduce a min idle time in the replica scanner r=spencerkimball a=spencerkimball This prevents busy looping in the scanner in the event there are more replicas on a store than can be processed in the scanner target interval (default = `10s`). This change also adjusts the default max idle time to `1s`; the previous value of `100ms` was unnecessarily short. Release note (performance improvement): on stores with 100s of thousands of replicas, this limits the scanner from running incessantly. 27449: storage: skip TestStoreRangeMergeWithData again r=bdarnell,a-robinson a=benesch The chaos added in 2ed1515 to force lease renewals while a merge is in progress is too disruptive and makes the test occasionally time out. See issue #27442 for an analysis. Skip the test for now. Release note: None Co-authored-by: Spencer Kimball <spencer.kimball@gmail.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
benesch
added a commit
to benesch/cockroach
that referenced
this issue
Jul 13, 2018
Merges could deadlock if the lease on the right-hand range was transferred or renewed during the merge transaction and the new leaseholder received any request before the GetSnapshotForMerge request. See cockroachdb#27442 and the comments within for details. Avoid the deadlock by moving the responsibility for waiting for a merge to complete from the replica to the store. Add a unit test that runs several get requests concurrently with several merges to reliably test for regressions. The problem previously only presented when stressing TestStoreRangeMergeWithData. Fix cockroachdb#27442. Release note: None
benesch
added a commit
to benesch/cockroach
that referenced
this issue
Jul 15, 2018
Merges could deadlock if the lease on the right-hand range was transferred or renewed during the merge transaction and the new leaseholder received any request before the GetSnapshotForMerge request. See cockroachdb#27442 and the comments within for details. Avoid the deadlock by moving the responsibility for waiting for a merge to complete from the replica to the store. Add a unit test that runs several get requests concurrently with several merges to reliably test for regressions. The problem previously only presented when stressing TestStoreRangeMergeWithData. Fix cockroachdb#27442. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jul 16, 2018
27454: storage: avoid deadlock during merges r=bdarnell a=benesch Merges could deadlock if the lease on the right-hand range was transferred or renewed during the merge transaction and the new leaseholder received any request before the GetSnapshotForMerge request. See #27442 and the comments within for details. Avoid the deadlock by moving the responsibility for waiting for a merge to complete from the replica to the store. Add a unit test that runs several get requests concurrently with several merges to reliably test for regressions. The problem previously only presented when stressing TestStoreRangeMergeWithData. Fix #27442. Fix #27454. Release note: None 27519: lint: speed up TestCrlfmt r=tschottdorf a=benesch Pass the -fast flag to crlfmt, which instructs it to skip running goimports/gofmt. It takes over 90s to run crlfmt without the -fast flag; with the flag, it takes only 2s. Note that this patch does not reduce lint coverage. The linter already runs gofmt in a separate test. Goimports does not enforce any particular ordering on the imports, beyond the sorting that gofmt mandates, so there's nothing to be lost by skipping it. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Consider the following sequence of events:
@bdarnell has suggested the following solution. Instead of blocking on the mergeCompleteCh in the replica, return a MergeInProgressError. Handle this error in the store by reaching into the replica for its mergeCompleteCh, and block there. Once the channel closes, retry the request. This neatly avoids deadlock by allowing blocked commands to exit the command queue without requiring the caller to retry them.
The text was updated successfully, but these errors were encountered: