stopper: Ensure that Closers and Ctxs are always cleaned up #27920

nvanbenschoten · 2018-07-25T17:30:20Z

Before this change, the Stopper made no guarantee that Closers
registered by the AddCloser method or Contexts tied to
the Stopper with WithCancelOn{Quiesce,Stop} were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

The PR also correctly synchronizes access to the sCancels map to
fix #27908.

Release note: None

Fixes cockroachdb#27908. Release note: None

Before this change the Stopper made no guarantee that `Closers` registered by the `AddCloser` method or `Contexts` tied to the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly cleaned up. In cases where these methods were called concurrently with the Stopper stopping, it was possible for the resources to leak. This change fixes this by handling cases where these methods are called concurrently with or after the Stopper has stopped. This allows them to provide stronger external guarantees. Release note: None

cockroach-teamcity · 2018-07-25T17:30:28Z

This change is

benesch

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

nvanbenschoten · 2018-07-25T18:20:28Z

bors r=benesch

27829: kv,storage: disable pipelining in merge transaction r=nvanbenschoten a=benesch It turns out that pipelining is fundamentally incompatible with merge transactions. The extra QueryIntent requests generated by pipelining can get routed to a RHS that is currently blocked for merging and wedge the merge. Consider the following sequence of events: 1. The merge transaction begins and lays down intents on the meta2 and local copy of the RHS range descriptor using asynchronous commits. 2. The RHS loses its lease. 3. The merge transaction sends a GetSnapshotForMergeRequest. The pipeliner injects two QueryIntent requests, one to the meta2 range descriptor and one to the local range descriptor on the RHS, as non-transactional requests always flush the pipeline. 4. The QueryIntent request hits the RHS and triggers a lease acquisition. The lease acquisition notices that the local range descriptor has a deletion intent, infers that a merge is in progress, and blocks all traffic, except for GetSnapshotForMerge requests. GetSnapshotForMerge requests are allowed to proceed even when the range is blocked for merging but QueryIntent requests are not. We can't, in general, allow QueryIntent requests through, as we might return stale data if the merge has already committed and the LHS is processing new writes. We could, I think, allow QueryIntents for the local range descriptor to proceed, but that's a very-specific corner case and hard to reason about. Just disable pipelining in the merge transaction to sidestep the issue. One other option: we could allow the RHS to serve reads up until the commit timestamp of the merge transaction. That would allow the GetSnapshotForMerge and QueryIntent requests from the merge transaction through, as well as any other read traffic executing beneath the merge transaction's timestamp. The downside is that the new leaseholder would have to wait out the full clock offset after the merge txn committed to serve any writes. (Just like the lease stasis period.) It was decided that this option was too disruptive and complex to pursue for the time being. Fix #27820. 27920: stopper: Ensure that Closers and Ctxs are always cleaned up r=benesch a=nvanbenschoten Before this change, the Stopper made no guarantee that `Closers` registered by the `AddCloser` method or `Contexts` tied to the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly cleaned up. In cases where these methods were called concurrently with the Stopper stopping, it was possible for the resources to leak. This change fixes this by handling cases where these methods are called concurrently with or after the Stopper has stopped. This allows them to provide stronger external guarantees. The PR also correctly synchronizes access to the `sCancels` map to fix #27908. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2018-07-25T18:50:24Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten added 2 commits July 25, 2018 13:24

stopper: synchronize access to sCancels functions

2018e93

Fixes cockroachdb#27908. Release note: None

nvanbenschoten requested a review from tbg July 25, 2018 17:30

benesch approved these changes Jul 25, 2018

View reviewed changes

nvanbenschoten mentioned this pull request Jul 25, 2018

backport-2.0: storage: prevent unbounded raft log growth without quorum #27868

Merged

craig bot merged commit 297a916 into cockroachdb:master Jul 25, 2018

nvanbenschoten deleted the nvanbenschoten/fixStopper branch July 26, 2018 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stopper: Ensure that Closers and Ctxs are always cleaned up #27920

stopper: Ensure that Closers and Ctxs are always cleaned up #27920

nvanbenschoten commented Jul 25, 2018

cockroach-teamcity commented Jul 25, 2018

benesch left a comment

nvanbenschoten commented Jul 25, 2018

craig bot commented Jul 25, 2018

stopper: Ensure that Closers and Ctxs are always cleaned up #27920

stopper: Ensure that Closers and Ctxs are always cleaned up #27920

Conversation

nvanbenschoten commented Jul 25, 2018

cockroach-teamcity commented Jul 25, 2018

benesch left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jul 25, 2018

craig bot commented Jul 25, 2018

Build succeeded