Add VerifyTxConsistency to backend. #17359

siyuanfoundation · 2024-02-02T06:29:34Z

Followup test verification for #17158

Should help us find all places impacted by readbuf inconsistency.
Similar to #17127

siyuanfoundation · 2024-02-02T18:34:34Z

/retest

siyuanfoundation · 2024-02-02T19:08:29Z

cc @serathius @ahrtr

server/storage/backend/backend.go

ahrtr · 2024-02-04T14:03:11Z

Also let's simplify the verification just at one place (e.g. at the end of each etcd transaction)

siyuanfoundation · 2024-02-05T18:02:06Z

Also let's simplify the verification just at one place (e.g. at the end of each etcd transaction)

Doing it in applyAll is essentially at the end of each etcd transaction.
Please see #17127 for the read tx level approach and why it does not work.
Another good place to do it is in func (t *batchTxBuffered) Unlock() because t.buf.writeback needs to be done before checking, but I cannot call schema.AllBuckets there because of cyclic dependencies.
"wrap backend.ReadTx() and backend.BatchTx() and intercept UnsafeRead, UnsafePut calls." this would involve some non trivial code refactoring, which may be more risky than beneficial for the purpose of verification.

server/storage/backend/verify.go

serathius · 2024-02-06T18:41:12Z

server/storage/backend/verify.go

+		if lg != nil {
+			lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket))
+		}
+		b.BatchTx().LockOutsideApply()


Are we sure we should use LockOutsideApply? We call this function in applyAll

yes, LockInsideApply would fail.

We should use LockOutsideApply. This is just verification code, obviously we don't need to execute the txPostLockInsideApplyHook.

serathius · 2024-02-06T18:47:17Z

server/storage/backend/verify.go

+		dataFromReadTxn[string(k)] = string(v)
+		return nil
+	})
+	if diff := cmp.Diff(dataFromWriteTxn, dataFromReadTxn); diff != "" {


Have you confirmed that this would detect all the recently discovered problems?

When I proposed #17126 I was thinking that we should implement a stub for backend that would implement the same methods but without bbolt and keep the data only in memory. With validate enabled, we would send all request to bbolt to both backend and the stub and compare results.

I think this would be a better long term approach, however your solution should be acceptable assuming it can reproduce previous issues.

It can detect #17158.
For #17263, since it is rare to write multiple keys in 1 tx, there is no existing issues to detect.

I was thinking that we should implement a stub for backend that would implement the same methods but without bbolt and keep the data only in memory

Sound like a good idea. We clear/reset the stub backend before each test, and compare the real backend with the stub at the end of each test. But when the member restarts, all the in-memory data will be lost, unless we also support persistent storage for the stub backend. It's a separate topic (This PR just verifies the consistency between read TXN and write TXN.), please feel free to raise a separate ticket to track it if you want.

But when the member restarts, all the in-memory data will be lost, unless we also support persistent storage for the stub backend

You are right that it's impossible to validate cross restart without keeping own persistent storage, however we don't need to test durability, just the runtime behavior. If we validate the data before member is restarted, we should trust the data when backend is loaded back after restart.

siyuanfoundation · 2024-02-06T19:32:03Z

/retest

ahrtr · 2024-02-06T19:38:51Z

server/storage/backend/verify.go

+		if b == nil {
+			return
+		}
+		if lg != nil {
+			lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket))
+		}


Minor comment: suggest to remove the ugly nil check. We require the callers to pass in valid parameters.

Suggested change

if b == nil {

return

}

if lg != nil {

lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket))

}

lg.Debug("verifyBackendConsistency", zap.Bool("skipSafeRangeBucket", skipSafeRangeBucket))

This is supposed to be an umbrella check. There are existing tests that have nil srv backend or nil logger. I don't want to fail those tests.

I don't want to fail those tests.

Which tests are "those tests"?

TestProcessDuplicatedAppRespMessage

OK. Actually we shouldn't enable & run the verification in unit test at all. They are only supposed to be executed in e2e and integration tests.

We may want to remove the following setting.

etcd/scripts/test.sh

Line 48 in 5d45a88

export ETCD_VERIFY=all

Since it's unrelated to this PR, so I won't insist on it. It can be discussed separately.

ahrtr · 2024-02-19T10:07:40Z

Please rebase this PR

ahrtr

lgtm

Thanks

cc @fuweid @jmhbnz @serathius

fuweid

I think we need to test that performance in the follow-up when uses enables CorruptCheckTime. It seems it's too expensive to read all the keys into memory.

ahrtr · 2024-02-22T07:14:22Z

I think we need to test that performance in the follow-up when uses enables CorruptCheckTime. It seems it's too expensive to read all the keys into memory.

The verification is only executed in test. Usually there isn't too much data in test.

fuweid · 2024-02-22T15:51:33Z

The verification is only executed in test.

Thanks for the comment.

fuweid

LGTM

ahrtr · 2024-02-22T18:15:26Z

I just merged another PR, which updated the go.mod files. Please rebase this PR. Sorry for the inconvenience.

Signed-off-by: Siyuan Zhang <sizhang@google.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>

siyuanfoundation · 2024-02-22T19:31:57Z

rebased.

siyuanfoundation marked this pull request as draft February 2, 2024 06:29

k8s-ci-robot added the do-not-merge/work-in-progress label Feb 2, 2024

siyuanfoundation force-pushed the verify-test branch from 0e15998 to 0719a26 Compare February 2, 2024 17:47

siyuanfoundation marked this pull request as ready for review February 2, 2024 17:55

k8s-ci-robot removed the do-not-merge/work-in-progress label Feb 2, 2024

siyuanfoundation requested a review from serathius February 2, 2024 19:06

ahrtr reviewed Feb 4, 2024

View reviewed changes

server/storage/backend/backend.go Outdated Show resolved Hide resolved

siyuanfoundation force-pushed the verify-test branch 2 times, most recently from 535ad95 to d20f507 Compare February 5, 2024 18:00

siyuanfoundation requested a review from ahrtr February 6, 2024 05:09

ahrtr reviewed Feb 6, 2024

View reviewed changes

server/storage/backend/verify.go Outdated Show resolved Hide resolved

server/storage/backend/verify.go Outdated Show resolved Hide resolved

siyuanfoundation force-pushed the verify-test branch from 933ae64 to cbe08e1 Compare February 6, 2024 17:30

serathius reviewed Feb 6, 2024

View reviewed changes

siyuanfoundation force-pushed the verify-test branch from cbe08e1 to 32c4858 Compare February 6, 2024 18:42

serathius reviewed Feb 6, 2024

View reviewed changes

ahrtr reviewed Feb 6, 2024

View reviewed changes

siyuanfoundation force-pushed the verify-test branch from 32c4858 to 6b1c9e2 Compare February 20, 2024 20:19

ahrtr approved these changes Feb 21, 2024

View reviewed changes

fuweid reviewed Feb 22, 2024

View reviewed changes

serathius approved these changes Feb 22, 2024

View reviewed changes

fuweid approved these changes Feb 22, 2024

View reviewed changes

Add VerifyTxConsistency to backend.

3565a82

Signed-off-by: Siyuan Zhang <sizhang@google.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>

siyuanfoundation force-pushed the verify-test branch from 6b1c9e2 to 3565a82 Compare February 22, 2024 19:31

ahrtr merged commit 8c7f911 into etcd-io:main Feb 22, 2024
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VerifyTxConsistency to backend. #17359

Add VerifyTxConsistency to backend. #17359

siyuanfoundation commented Feb 2, 2024 •

edited

Loading

siyuanfoundation commented Feb 2, 2024

siyuanfoundation commented Feb 2, 2024

ahrtr commented Feb 4, 2024

siyuanfoundation commented Feb 5, 2024 •

edited

Loading

serathius Feb 6, 2024

siyuanfoundation Feb 6, 2024

ahrtr Feb 6, 2024

serathius Feb 6, 2024

siyuanfoundation Feb 6, 2024

ahrtr Feb 7, 2024

serathius Feb 7, 2024

siyuanfoundation commented Feb 6, 2024

ahrtr Feb 6, 2024

siyuanfoundation Feb 6, 2024

ahrtr Feb 7, 2024

siyuanfoundation Feb 7, 2024

ahrtr Feb 7, 2024

ahrtr commented Feb 19, 2024

ahrtr left a comment

fuweid left a comment

ahrtr commented Feb 22, 2024 •

edited

Loading

fuweid commented Feb 22, 2024

fuweid left a comment

ahrtr commented Feb 22, 2024

siyuanfoundation commented Feb 22, 2024

Add VerifyTxConsistency to backend. #17359

Add VerifyTxConsistency to backend. #17359

Conversation

siyuanfoundation commented Feb 2, 2024 • edited Loading

siyuanfoundation commented Feb 2, 2024

siyuanfoundation commented Feb 2, 2024

ahrtr commented Feb 4, 2024

siyuanfoundation commented Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siyuanfoundation commented Feb 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr commented Feb 19, 2024

ahrtr left a comment

Choose a reason for hiding this comment

fuweid left a comment

Choose a reason for hiding this comment

ahrtr commented Feb 22, 2024 • edited Loading

fuweid commented Feb 22, 2024

fuweid left a comment

Choose a reason for hiding this comment

ahrtr commented Feb 22, 2024

siyuanfoundation commented Feb 22, 2024

siyuanfoundation commented Feb 2, 2024 •

edited

Loading

siyuanfoundation commented Feb 5, 2024 •

edited

Loading

ahrtr commented Feb 22, 2024 •

edited

Loading