Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stopper: Ensure that Closers and Ctxs are always cleaned up #27920

Merged
merged 2 commits into from
Jul 25, 2018

Conversation

nvanbenschoten
Copy link
Member

Before this change, the Stopper made no guarantee that Closers
registered by the AddCloser method or Contexts tied to
the Stopper with WithCancelOn{Quiesce,Stop} were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

The PR also correctly synchronizes access to the sCancels map to
fix #27908.

Release note: None

Before this change the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

Release note: None
@nvanbenschoten nvanbenschoten requested a review from tbg July 25, 2018 17:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@nvanbenschoten
Copy link
Member Author

bors r=benesch

craig bot pushed a commit that referenced this pull request Jul 25, 2018
27829: kv,storage: disable pipelining in merge transaction r=nvanbenschoten a=benesch

It turns out that pipelining is fundamentally incompatible with merge
transactions. The extra QueryIntent requests generated by pipelining can
get routed to a RHS that is currently blocked for merging and wedge the
merge. Consider the following sequence of events:

  1. The merge transaction begins and lays down intents on the meta2 and
     local copy of the RHS range descriptor using asynchronous commits.

  2. The RHS loses its lease.

  3. The merge transaction sends a GetSnapshotForMergeRequest. The
     pipeliner injects two QueryIntent requests, one to the meta2 range
     descriptor and one to the local range descriptor on the RHS, as
     non-transactional requests always flush the pipeline.

  4. The QueryIntent request hits the RHS and triggers a lease
     acquisition. The lease acquisition notices that the local range
     descriptor has a deletion intent, infers that a merge is in
     progress, and blocks all traffic, except for GetSnapshotForMerge
     requests.

GetSnapshotForMerge requests are allowed to proceed even when the range
is blocked for merging but QueryIntent requests are not. We can't, in
general, allow QueryIntent requests through, as we might return stale
data if the merge has already committed and the LHS is processing
new writes. We could, I think, allow QueryIntents for the local range
descriptor to proceed, but that's a very-specific corner case and hard
to reason about.

Just disable pipelining in the merge transaction to sidestep the issue.

One other option: we could allow the RHS to serve reads up until the
commit timestamp of the merge transaction. That would allow the
GetSnapshotForMerge and QueryIntent requests from the merge transaction
through, as well as any other read traffic executing beneath the merge
transaction's timestamp. The downside is that the new leaseholder would
have to wait out the full clock offset after the merge txn committed to
serve any writes. (Just like the lease stasis period.) It was decided
that this option was too disruptive and complex to pursue for the time
being.

Fix #27820.

27920: stopper: Ensure that Closers and Ctxs are always cleaned up r=benesch a=nvanbenschoten

Before this change, the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

The PR also correctly synchronizes access to the `sCancels` map to
fix #27908.

Release note: None

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jul 25, 2018

Build succeeded

@craig craig bot merged commit 297a916 into cockroachdb:master Jul 25, 2018
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/fixStopper branch July 26, 2018 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: TestStoreGossipSystemData failed under stress
3 participants