-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VerifyTxConsistency to backend. #17359
Conversation
0e15998
to
0719a26
Compare
/retest |
cc @serathius @ahrtr |
Also let's simplify the verification just at one place (e.g. at the end of each etcd transaction) |
535ad95
to
d20f507
Compare
Doing it in applyAll is essentially at the end of each etcd transaction. |
933ae64
to
cbe08e1
Compare
if lg != nil { | ||
lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket)) | ||
} | ||
b.BatchTx().LockOutsideApply() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we should use LockOutsideApply? We call this function in applyAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, LockInsideApply would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use LockOutsideApply
. This is just verification code, obviously we don't need to execute the txPostLockInsideApplyHook.
cbe08e1
to
32c4858
Compare
dataFromReadTxn[string(k)] = string(v) | ||
return nil | ||
}) | ||
if diff := cmp.Diff(dataFromWriteTxn, dataFromReadTxn); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you confirmed that this would detect all the recently discovered problems?
When I proposed #17126 I was thinking that we should implement a stub for backend that would implement the same methods but without bbolt and keep the data only in memory. With validate enabled, we would send all request to bbolt to both backend and the stub and compare results.
I think this would be a better long term approach, however your solution should be acceptable assuming it can reproduce previous issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we should implement a stub for backend that would implement the same methods but without bbolt and keep the data only in memory
Sound like a good idea. We clear/reset the stub backend before each test, and compare the real backend with the stub at the end of each test. But when the member restarts, all the in-memory data will be lost, unless we also support persistent storage for the stub backend. It's a separate topic (This PR just verifies the consistency between read TXN and write TXN.), please feel free to raise a separate ticket to track it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when the member restarts, all the in-memory data will be lost, unless we also support persistent storage for the stub backend
You are right that it's impossible to validate cross restart without keeping own persistent storage, however we don't need to test durability, just the runtime behavior. If we validate the data before member is restarted, we should trust the data when backend is loaded back after restart.
/retest |
if b == nil { | ||
return | ||
} | ||
if lg != nil { | ||
lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: suggest to remove the ugly nil check. We require the callers to pass in valid parameters.
if b == nil { | |
return | |
} | |
if lg != nil { | |
lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket)) | |
} | |
lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be an umbrella check. There are existing tests that have nil srv backend or nil logger. I don't want to fail those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to fail those tests.
Which tests are "those tests"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestProcessDuplicatedAppRespMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Actually we shouldn't enable & run the verification in unit test at all. They are only supposed to be executed in e2e and integration tests.
We may want to remove the following setting.
Line 48 in 5d45a88
export ETCD_VERIFY=all |
Since it's unrelated to this PR, so I won't insist on it. It can be discussed separately.
Please rebase this PR |
32c4858
to
6b1c9e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to test that performance in the follow-up when uses enables CorruptCheckTime
. It seems it's too expensive to read all the keys into memory.
The verification is only executed in test. Usually there isn't too much data in test. |
Thanks for the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I just merged another PR, which updated the go.mod files. Please rebase this PR. Sorry for the inconvenience. |
Signed-off-by: Siyuan Zhang <sizhang@google.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>
6b1c9e2
to
3565a82
Compare
rebased. |
Followup test verification for #17158
Should help us find all places impacted by readbuf inconsistency.
Similar to #17127