Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,kvclient: disambiguate aborted txn case #53156

Merged

Conversation

andreimatei
Copy link
Contributor

This patch eliminates an AmbiguousResultError case. When a committer
finds an intent to be missing after having STAGED the txn, it tries to
figure out if the transaction has been committed or aborted behind its
back. It could be either: another actor might have performed recovery
and found the txn to be implicitly committed, or the recovery might
succeed in preventing an intent from being written and thus the
transaction will be aborted.
The commiter thus reads the txn record and tries to figure out what the
deal is. The case in which a COMMITed txn record is found is
unambiguous. The case when the txn record is not found is ambiguous: it
could have been GCed, and the status is lost. This case stays ambiguous.
The case where the transaction record is found to be ABORTED used to be
treated as ambiguous, but this patch makes it unambiguous. It used to be
ambiguous for a bad reason: the QueryTxn request used to read the txn
record conflated an ABORTED record with a missing record that cannot be
created anymore because of the tscache protections.

This patch adds more output to QueryTxn, specifying if the record has
been found or if it has been synthesized. This allows the DistSender to
eliminate the ambiguity when the record is found.

This patch relies on the fact that nobody writes an ABORTED record (or,
any record) after the record is GC'ed.

Fixes #52566
Fixes #52186

One of the issues above is a TPCC failure with an ambiguous error. I'm
not sure that this was the ambiguity in question, but it seems likely.

Release note (general change): Some cases where AmbiguousResultErrors
were produced under high load now return non-ambiguous, retriable
errors.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 1045 at r1 (raw file):

		// the two cases, and so we're forced to return an ambiguous error.
		// On the other hand, if the record exists, then we know that the transaction
		// did not commit.

Mention that's because a committed record can't be GC'ed and then recreated as ABORTED.


pkg/kv/kvserver/batcheval/cmd_query_txn.go, line 68 at r1 (raw file):

	// Fetch transaction record; if missing, attempt to synthesize one.
	ok, err := storage.MVCCGetProto(

nit: curious why you went away from:

if ok, err := ...; err != nil {

} else if ok {

} else {

}

No strong preferences from me, just curious.

@andreimatei andreimatei force-pushed the kv.nonambiguous-aborted-record branch from b5ad922 to 5db91e7 Compare August 21, 2020 01:07
This patch eliminates an AmbiguousResultError case. When a committer
finds an intent to be missing after having STAGED the txn, it tries to
figure out if the transaction has been committed or aborted behind its
back. It could be either: another actor might have performed recovery
and found the txn to be implicitly committed, or the recovery might
succeed in preventing an intent from being written and thus the
transaction will be aborted.
The commiter thus reads the txn record and tries to figure out what the
deal is. The case in which a COMMITed txn record is found is
unambiguous. The case when the txn record is not found is ambiguous: it
could have been GCed, and the status is lost. This case stays ambiguous.
The case where the transaction record is found to be ABORTED used to be
treated as ambiguous, but this patch makes it unambiguous. It used to be
ambiguous for a bad reason: the QueryTxn request used to read the txn
record conflated an ABORTED record with a missing record that cannot be
created anymore because of the tscache protections.

This patch adds more output to QueryTxn, specifying if the record has
been found or if it has been synthesized. This allows the DistSender to
eliminate the ambiguity when the record is found.

This patch relies on the fact that nobody writes an ABORTED record (or,
any record) after the record is GC'ed.

Fixes cockroachdb#52566
Fixes cockroachdb#52186

One of the issues above is a TPCC failure with an ambiguous error. I'm
not sure that this was the ambiguity in question, but it seems likely.

Release note (general change): Some cases where AmbiguousResultErrors
were produced under high load now return non-ambiguous, retriable
errors.
@andreimatei andreimatei force-pushed the kv.nonambiguous-aborted-record branch from 5db91e7 to 3dcb6f1 Compare August 21, 2020 01:25
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 1045 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Mention that's because a committed record can't be GC'ed and then recreated as ABORTED.

done


pkg/kv/kvserver/batcheval/cmd_query_txn.go, line 68 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: curious why you went away from:

if ok, err := ...; err != nil {

} else if ok {

} else {

}

No strong preferences from me, just curious.

no particular reason, but I prefer the !err path to not be inside an else

@craig
Copy link
Contributor

craig bot commented Aug 21, 2020

Build succeeded:

@craig craig bot merged commit 4ab67e2 into cockroachdb:master Aug 21, 2020
@andreimatei andreimatei deleted the kv.nonambiguous-aborted-record branch August 21, 2020 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants