-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver,kvclient: disambiguate aborted txn case #53156
kvserver,kvclient: disambiguate aborted txn case #53156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/kvclient/kvcoord/dist_sender.go, line 1045 at r1 (raw file):
// the two cases, and so we're forced to return an ambiguous error. // On the other hand, if the record exists, then we know that the transaction // did not commit.
Mention that's because a committed record can't be GC'ed and then recreated as ABORTED.
pkg/kv/kvserver/batcheval/cmd_query_txn.go, line 68 at r1 (raw file):
// Fetch transaction record; if missing, attempt to synthesize one. ok, err := storage.MVCCGetProto(
nit: curious why you went away from:
if ok, err := ...; err != nil {
} else if ok {
} else {
}
No strong preferences from me, just curious.
b5ad922
to
5db91e7
Compare
This patch eliminates an AmbiguousResultError case. When a committer finds an intent to be missing after having STAGED the txn, it tries to figure out if the transaction has been committed or aborted behind its back. It could be either: another actor might have performed recovery and found the txn to be implicitly committed, or the recovery might succeed in preventing an intent from being written and thus the transaction will be aborted. The commiter thus reads the txn record and tries to figure out what the deal is. The case in which a COMMITed txn record is found is unambiguous. The case when the txn record is not found is ambiguous: it could have been GCed, and the status is lost. This case stays ambiguous. The case where the transaction record is found to be ABORTED used to be treated as ambiguous, but this patch makes it unambiguous. It used to be ambiguous for a bad reason: the QueryTxn request used to read the txn record conflated an ABORTED record with a missing record that cannot be created anymore because of the tscache protections. This patch adds more output to QueryTxn, specifying if the record has been found or if it has been synthesized. This allows the DistSender to eliminate the ambiguity when the record is found. This patch relies on the fact that nobody writes an ABORTED record (or, any record) after the record is GC'ed. Fixes cockroachdb#52566 Fixes cockroachdb#52186 One of the issues above is a TPCC failure with an ambiguous error. I'm not sure that this was the ambiguity in question, but it seems likely. Release note (general change): Some cases where AmbiguousResultErrors were produced under high load now return non-ambiguous, retriable errors.
5db91e7
to
3dcb6f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go, line 1045 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Mention that's because a committed record can't be GC'ed and then recreated as ABORTED.
done
pkg/kv/kvserver/batcheval/cmd_query_txn.go, line 68 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: curious why you went away from:
if ok, err := ...; err != nil { } else if ok { } else { }
No strong preferences from me, just curious.
no particular reason, but I prefer the !err
path to not be inside an else
Build succeeded: |
This patch eliminates an AmbiguousResultError case. When a committer
finds an intent to be missing after having STAGED the txn, it tries to
figure out if the transaction has been committed or aborted behind its
back. It could be either: another actor might have performed recovery
and found the txn to be implicitly committed, or the recovery might
succeed in preventing an intent from being written and thus the
transaction will be aborted.
The commiter thus reads the txn record and tries to figure out what the
deal is. The case in which a COMMITed txn record is found is
unambiguous. The case when the txn record is not found is ambiguous: it
could have been GCed, and the status is lost. This case stays ambiguous.
The case where the transaction record is found to be ABORTED used to be
treated as ambiguous, but this patch makes it unambiguous. It used to be
ambiguous for a bad reason: the QueryTxn request used to read the txn
record conflated an ABORTED record with a missing record that cannot be
created anymore because of the tscache protections.
This patch adds more output to QueryTxn, specifying if the record has
been found or if it has been synthesized. This allows the DistSender to
eliminate the ambiguity when the record is found.
This patch relies on the fact that nobody writes an ABORTED record (or,
any record) after the record is GC'ed.
Fixes #52566
Fixes #52186
One of the issues above is a TPCC failure with an ambiguous error. I'm
not sure that this was the ambiguity in question, but it seems likely.
Release note (general change): Some cases where AmbiguousResultErrors
were produced under high load now return non-ambiguous, retriable
errors.