Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: keep tombstone if revision == compactAtRev #18274

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Jul 3, 2024

Before this patch, the tombstone can be deleted if its revision is equal compacted revision. It causes that the watch subscriber won't get this DELETE event. Based on Compact API [1], we should keep tombstone revision if it's not less than the compaction revision.

CompactionRequest compacts the key-value store up to a given revision.
All superseded keys with a revision less than the compaction revision
will be removed.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.


To Reviewers:

During writing tests, I found that the compact doesn't always remove smaller revision data.
For example,

➜  bin/etcdctl put hello world -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}

➜  bin/etcdctl put hello-v2 world -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2}}

➜  bin/etcdctl put hello world-v2 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":2}}

➜  bin/etcdctl get --rev 2 hello
hello
world

➜  bin/etcdctl compact 3 --physical=true
compacted revision 3

➜  bin/etcdctl get --rev 2 hello
{"level":"warn","ts":"2024-07-03T20:12:39.086991+0800","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002105a0/127.0.0.1:2379","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted

➜  bin/tools/etcd-dump-db iterate-bucket ./default.etcd/member/snap/db key --decode
rev={Revision:{Main:4 Sub:0} tombstone:false}, value=[key "hello" | val "world-v2" | created 2 | mod 4 | ver 2]
rev={Revision:{Main:3 Sub:0} tombstone:false}, value=[key "hello-v2" | val "world" | created 3 | mod 3 | ver 1]
rev={Revision:{Main:2 Sub:0} tombstone:false}, value=[key "hello" | val "world" | created 2 | mod 2 | ver 1]

Not sure we should fix it in the follow-up. Just want to highlight it.

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@fuweid
Copy link
Member Author

fuweid commented Jul 3, 2024

ping @ahrtr @serathius @siyuanfoundation @chaochn47 ~ PTAL, thanks

@ahrtr ahrtr self-requested a review July 3, 2024 12:48
@ahrtr
Copy link
Member

ahrtr commented Jul 3, 2024

➜  bin/tools/etcd-dump-db iterate-bucket ./default.etcd/member/snap/db key --decode
rev={Revision:{Main:4 Sub:0} tombstone:false}, value=[key "hello" | val "world-v2" | created 2 | mod 4 | ver 2]
rev={Revision:{Main:3 Sub:0} tombstone:false}, value=[key "hello-v2" | val "world" | created 3 | mod 3 | ver 1]
rev={Revision:{Main:2 Sub:0} tombstone:false}, value=[key "hello" | val "world" | created 2 | mod 2 | ver 1]

Not sure we should fix it in the follow-up. Just want to highlight it.

I think it's expected behavior. Compaction always keeps the latest revision K, which is <= compaction_rev.

@ahrtr
Copy link
Member

ahrtr commented Jul 3, 2024

The e2e test cases, especially the mix-version test cases, as mentioned in #18162 (comment) are important. What's the plan to implement the e2e test cases?

@fuweid
Copy link
Member Author

fuweid commented Jul 3, 2024

I think it's expected behavior. Compaction always keeps the latest revision K, which is <= compaction_rev.

ETCD server prevents requests from fetching revisions which are smaller than compacted revision. It looks fine to me.
However, based on the compact API, if the revision is not latest and smaller than the compaction revision, that revision should be deleted.

The e2e test cases, especially the mix-version test cases, as mentioned in #18162 (comment) are important.

For the mix-version part, it's hard to finish it in one request. I'm trying to use patch mode in this pull request, like https://github.com/etcd-io/etcd/blob/main/tests/robustness/patches/beforeSendWatchResponse/watch.patch.

@ahrtr
Copy link
Member

ahrtr commented Jul 3, 2024

ETCD server prevents requests from fetching revisions which are smaller than compacted revision. It looks fine to me.
However, based on the compact API, if the revision is not latest and smaller than the compaction revision, that revision should be deleted

It sounds reasonable, but we still have two options, either keep it as it's or fix it.

The rev, which is smaller than compaction rev, isn't accessible via the get/range API, but it's still in the db file. I agree it's an issue, but it seems harmless. Fixing it may break user experience as well.

If we decide to fix it (let's go for this direction if there is NO objection),

  • we should fix it in one PR instead of two.
  • evaluate the effort & impact.

For the mix-version part, it's hard to finish it in one request.

It's OK to do it in followup PR, but we should manually verify it before we merge this PR. Please also feedback if you need help from others.

server/storage/mvcc/key_index.go Outdated Show resolved Hide resolved
server/storage/mvcc/key_index.go Outdated Show resolved Hide resolved
server/storage/mvcc/key_index.go Outdated Show resolved Hide resolved
@fuweid fuweid force-pushed the fix-event-loss-caused-by-compact branch 4 times, most recently from abdda43 to 2f0a9e1 Compare July 4, 2024 10:01
@fuweid
Copy link
Member Author

fuweid commented Jul 4, 2024

ping @ahrtr I updated the pull request with new e2e cases. The TestVerifyHashKVAfterCompactMixVersion is to test this change in a mix-version cluster (1 x main process + 2 x v3.5.14 process). One case will cause inconsistent hash value and I left comment in the case. Please take a look when you have time. Thanks

client/pkg/testutil/leak.go Outdated Show resolved Hide resolved
server/storage/mvcc/key_index.go Outdated Show resolved Hide resolved
server/storage/mvcc/key_index.go Show resolved Hide resolved
@fuweid fuweid force-pushed the fix-event-loss-caused-by-compact branch from 1143e2a to c19beea Compare July 9, 2024 15:15
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
tests/e2e/watch_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Jul 10, 2024

Please put all hashKV related e2e cases into a separate file, e.g. hashkv_test.go

@ahrtr
Copy link
Member

ahrtr commented Jul 11, 2024

Discussed with @fuweid today. The existing implementation of HashByRev and Compaction are tightly coupled, and also they don't meet the expected behavior (see below).

The draft expected behavior is recorded in the doc. See summary below as well. Please comment in the doc to ensure we have consensus.

HashByRev (rev)

  • All revisions up to $(rev) should be included/calculated. For example, HashByRev(9), all revisions, which <=9, should be included/calculated.
  • All members should always return the same hash, no matter if it's a mix-version cluster or not.

Concern
Compatibility is a big concern. it's hard to ensure all members always return the same hash in a mix-version cluster.

Solution: use cluster-level feature-gate to resolve this concern. The fix/change will not be applied until all members have enabled the feature.

Compaction

When compacting revision X,

  • compact all history up to X (exclusive), but we always keep the revision X.
  • For all revisions which are < X, always keep the latest non-tombstone revision for each key.

It means that users can start to read Revision X and watch starting from X after the compaction. Reading or Watching X-1 will get an ErrCompacted error.

For example, compact (9),

  • Revision 9 will be kept;
  • For all revisions, which < 9
    • If it’s the latest non-tombstone revision for a key, then keep it;
    • Otherwise, compact the revisions

Next step

  • Resolve *: keep tombstone if revision == compactAtRev #18274 (comment) in a separate PR;
  • Ensure we have consensus on the expected behavior. Please comment on the doc;
  • Make change per the agreed expected behavior, but it should can be enabled or disabled by a flag. Once the cluster-level feature is ready, migrate the flag to the cluster-level feature.

@ahrtr
Copy link
Member

ahrtr commented Aug 7, 2024

Overall looks good to me.

@ivanvc @jmhbnz @serathius @siyuanfoundation Please take a look.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid fuweid force-pushed the fix-event-loss-caused-by-compact branch from ac157d2 to 08b8b5b Compare August 7, 2024 13:39
Before this patch, the tombstone can be deleted if its revision is equal
compacted revision. It causes that the watch subscriber won't get this
DELETE event. Based on Compact API[1], we should keep tombstone revision
if it's not less than the compaction revision.

> CompactionRequest compacts the key-value store up to a given revision.
> All superseded keys with a revision less than the compaction revision
> will be removed.

[1]: https://etcd.io/docs/latest/dev-guide/api_reference_v3/

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid fuweid force-pushed the fix-event-loss-caused-by-compact branch from 08b8b5b to bbdc941 Compare August 7, 2024 13:56
@fuweid
Copy link
Member Author

fuweid commented Aug 7, 2024

/retest

@serathius
Copy link
Member

Can we close/mark as addressed all the comments opened about the code?

@fuweid
Copy link
Member Author

fuweid commented Aug 9, 2024

Can we close/mark as addressed all the comments opened about the code?

Done. PTAL,thanks

})
}

func testStartWatcherFromCompactedRevision(t *testing.T, performCompactOnTombstone bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would benefit from splitting it up and cleanup, but let's not block fix on test.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Nice work! @fuweid

@ahrtr ahrtr merged commit 1d0a6b3 into etcd-io:main Aug 16, 2024
45 checks passed
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, fuweid, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr
Copy link
Member

ahrtr commented Aug 16, 2024

@fuweid could you backport #18369 and this PR to 3.5 and 3.4? thx

@fuweid
Copy link
Member Author

fuweid commented Aug 16, 2024

@fuweid could you backport #18369 and this PR to 3.5 and 3.4? thx

@ahrtr Yes, I will do the backport. Thanks for the review! @ahrtr @serathius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants