Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.4] allow downgrade from 3.5 #17330

Merged
merged 4 commits into from
Feb 20, 2024

Conversation

siyuanfoundation
Copy link
Contributor

@siyuanfoundation siyuanfoundation commented Jan 27, 2024

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Part of #15878 (comment)
Tested downgrade from 3.5 cluster locally:

  1. Start 3.5 cluster
bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
  1. start putting data
go run tools/benchmark/main.go put --total=10000000 --key-size=64 --val-size=512`

bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 endpoint status -w table
bin/etcdctl auth status
  1. kill infra3
  2. restart infra3 with 3.4 binary
../etcd-3.4/bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr --data-dir infra3.etcd --accept-next-storage-version

bin/etcdctl auth status
  1. kill and restart infra2 with 3.4 binary
../etcd-3.4/bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr --data-dir infra2.etcd --accept-next-storage-version
  1. kill and restart infra1 with 3.4 binary
../etcd-3.4/bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --enable-pprof --logger=zap --log-outputs=stderr --data-dir infra1.etcd --accept-next-storage-version

@siyuanfoundation siyuanfoundation force-pushed the 3.4-downgrade branch 2 times, most recently from f5c17ba to ebbda41 Compare January 27, 2024 00:23
@siyuanfoundation
Copy link
Contributor Author

cc @serathius @ahrtr

@serathius
Copy link
Member

Not sure how this PR refers to dev work described in https://docs.google.com/document/d/1siSo-tA_xLRo5li5wK8KqWsOiq34PIgtmuSDcyxxwsY/edit?usp=sharing

In document you described that downgrade will gated behind "--allow-downgrade" flag and will include implementation of AuthStatusRequest handling.

mvcc/kvstore.go Outdated Show resolved Hide resolved
@siyuanfoundation
Copy link
Contributor Author

Not sure how this PR refers to dev work described in https://docs.google.com/document/d/1siSo-tA_xLRo5li5wK8KqWsOiq34PIgtmuSDcyxxwsY/edit?usp=sharing

In document you described that downgrade will gated behind "--allow-downgrade" flag and will include implementation of AuthStatusRequest handling.

With the etcdutil migrate and online schema check, "--allow-downgrade" flag is kind of redundant.
But I see your point about snapshot.

Added "--allow-downgrade" flag and implementation of AuthStatusRequest handling.

@siyuanfoundation
Copy link
Contributor Author

/retest

@siyuanfoundation
Copy link
Contributor Author

/retest

@siyuanfoundation
Copy link
Contributor Author

cc @lavacat

@siyuanfoundation siyuanfoundation force-pushed the 3.4-downgrade branch 3 times, most recently from 04fa384 to 325ec09 Compare February 6, 2024 22:15
// UnsafeDetectSchemaVersion returns version of storage schema. Returned value depends on etcd version that created the backend. For
// * v3.6 and newer will return storage version.
// * v3.5 will return it's version if it includes any storage fields added in v3.5.
// * v3.4 will return it's version if it doesn't include any storage fields added in v3.5.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 3.3, it will also return v3_4. So it will not impact the case of upgrading 3.3 to 3.4. We need to clarify this at least in comment and verify it in test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. As for test, I could not find any difference in the data schemas of 3.3 and 3.4, not sure about what to test.

etcdserver/api/membership/cluster.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Feb 14, 2024

Overall looks good. We need to add a couple of e2e test a well,

  • The normal upgrade from 3.3 to 3.4. We need to ensure the normal upgrading from 3.3 to 3.4 isn't impacted by this PR.
  • Downgrade from 3.5 to 3.4
    • next-cluster-version-compatible = false
    • next-cluster-version-compatible = true

It can be addressed in a followup, but please manually verify the cases.

@siyuanfoundation
Copy link
Contributor Author

siyuanfoundation commented Feb 15, 2024

Tested 3.3 -> 3.4 -> 3.5 ->3.3 sequence manually with:

  1. start 3.3 cluster
../etcd-3.3/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new

../etcd-3.3/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new

../etcd-3.3/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new
  1. verify status
go run tools/benchmark/main.go put --total=10000000 --key-size=64 --val-size=512`

bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 endpoint status -w table
  1. upgrade server one by one to 3.4
../etcd-3.4/bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --data-dir infra3.etcd 

../etcd-3.4/bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --data-dir infra2.etcd

../etcd-3.4/bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --data-dir infra1.etcd
  1. verify status
bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 endpoint status -w table
  1. upgrade server one by one to 3.5
  2. verify status
bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 endpoint status -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  http://127.0.0.1:2379 | 8211f1d0f64f3269 |  3.5.12 |  377 MB |     false |      false |        17 |     454917 |             454917 |        |
| http://127.0.0.1:22379 | 91bc3c398fb3c146 |  3.5.12 |  377 MB |     false |      false |        17 |     454917 |             454917 |        |
| http://127.0.0.1:32379 | fd422379fda50e48 |  3.5.12 |  377 MB |      true |      false |        17 |     454917 |             454917 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

bin/etcdctl auth status
>> Authentication Status: false
>> AuthRevision: 1

  1. downgrade to 3.4 without the --next-cluster-version-compatible flag
../etcd-3.4/bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --data-dir infra3.etcd

# fails with
>> etcdserver/membership: invalid downgrade; (current version: 3.4.29 is lower than determined cluster version: 3.5).
  1. downgrade to 3.4 the --next-cluster-version-compatible flag
../etcd-3.4/bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state existing --data-dir infra3.etcd --next-cluster-version-compatible
  1. verify status
  2. finish downgrading the other 2 servers, with enough writes when some server is off to trigger snapshot restore.
  3. verify status
bin/etcdctl --endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379 endpoint status -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  http://127.0.0.1:2379 | 8211f1d0f64f3269 |  3.4.29 |  527 MB |     false |      false |        20 |     635804 |             635804 |        |
| http://127.0.0.1:22379 | 91bc3c398fb3c146 |  3.4.29 |  527 MB |     false |      false |        20 |     635804 |             635804 |        |
| http://127.0.0.1:32379 | fd422379fda50e48 |  3.4.29 |  528 MB |      true |      false |        20 |     635804 |             635804 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

mvcc/downgrade.go Outdated Show resolved Hide resolved
mvcc/downgrade.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Feb 15, 2024

cc @fuweid to take a look. This PR should be also important to AKS.

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. But it seems that this patch needs go mod tidy.

@ahrtr
Copy link
Member

ahrtr commented Feb 19, 2024

But it seems that this patch needs go mod tidy.

This PR doesn't change anything for go.mod and go.sum files. I do not see any issue when running "go mod tidy" command. Please ensure you use exactly the same go version as https://github.com/etcd-io/etcd/blob/release-3.4/.go-version#L1

@fuweid
Copy link
Member

fuweid commented Feb 19, 2024

This PR doesn't change anything for go.mod and go.sum files. I do not see any issue when running "go mod tidy" command.

If you pull this pull request in your local, you will see that. This pull request doesn't rebase so it shows diff.
It's fine. It's not blocking comment.

@ahrtr
Copy link
Member

ahrtr commented Feb 19, 2024

If you pull this pull request in your local, you will see that. This pull request doesn't rebase so it shows diff.

Yes, it's true. Curious why the workflow did not fail.

Anyway, please rebase this PR. @siyuanfoundation

Bogdan Kanivets and others added 4 commits February 20, 2024 08:55
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
@siyuanfoundation
Copy link
Contributor Author

rebased. the workflow still passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants