Downgrade support from 3.5 to 3.4 #15878

logicalhan · 2023-05-11T15:20:24Z

What would you like to be added?

I would like to be able to safely downgrade from 3.5 to 3.4, and then safely reupgrade back to 3.5.

Why is this needed?

Given the vast number of data correctness issues we've unearthed in etcd 3.5 (many of them fixed by @ahrtr and @serathius), I have personal reservations about upgrading my k8s clusters to use 3.5. If there was a working rollback strategy (tested of course, as well), then I would be much more inclined to update my etcds to a more recent version.

serathius · 2023-05-11T15:42:41Z

I think it could be easily added to etcdutl migrate command allowing for safe offline downgrade and upgrade operations.
Code: https://github.com/etcd-io/etcd/blob/main/etcdutl/etcdutl/migrate_command.go

This would also help with kubernetes/kubernetes#117906 and cleanup of kubernetes migrate script for etcd.

lavacat · 2023-05-11T16:31:08Z

Please assign this to me, we already have a minimal internal patch to address this. In current form - it's a 3.4 patch that allows 3.4 to be deployed within 3.5 cluster to avoid downtime and perform a rolling downgrade.
It's done by hacking version and removing confState and term keys.
But it would be great to make it part of migrate and add more testing around it.

serathius · 2023-05-11T17:07:58Z

Just a note, we support for rolling update is out of scope for now. Let's start with the migrate script.

lavacat · 2023-05-27T07:57:29Z

Quick update - trying to get POC to work. The idea is to run etcdutl migrate --data-dir data-3.5 --target-version 3.4 and get a data dir that etcd 3.4 can be started with.
My understanding is that currently migrate only updates MetaStorageVersionName key that was added since 3.6. But it won't update ClusterClusterVersionKeyName and version in v2store.

At the moment, running into

etcdserver/membership: cluster cannot be downgraded (current version: 3.4.26 is lower than determined cluster version: 3.5).

because of v2store version.

lavacat · 2023-05-30T08:22:28Z

For reference, tried running etcdctl downgrade from etcd 3.6 build targeting 3.5 cluster, but it didn't work.

Related design docs
etcd Downgrades Design
etcd storage versioning

$ ./bin/etcdctl downgrade validate 3.4
Downgrade validate success, cluster version 3.5.0

$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-05-30T01:12:41.770844-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

etcd 3.5 one node cluster log

{"level":"info","ts":"2023-05-30T01:12:36.794018-0700","caller":"membership/cluster.go:890","msg":"The server is ready to downgrade","target-version":"3.4.0","server-version":"3.5.9"}
{"level":"warn","ts":"2023-05-30T01:12:36.88595-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}
{"level":"warn","ts":"2023-05-30T01:12:41.77082-0700","caller":"etcdserver/v3_server.go:1047","msg":"reject downgrade request","error":"etcdserver: request timed out"}
{"level":"warn","ts":"2023-05-30T01:12:41.770895-0700","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-05-30T01:12:36.773319-0700","time spent":"4.997554901s","remote":"127.0.0.1:62022","response type":"/etcdserverpb.Maintenance/Downgrade","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2023-05-30T01:12:41.886266-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}

Going to debug this more.

jpbetz · 2023-06-01T01:54:28Z

Did we ever de-couple etcd version from data storage version? I vaguely recall multiple people pointing out that it is sort of silly that you can't automatically downgrade from 3.5 to 3.4 given that the file formats of the persisted data is identical, and that if we just gave data files a format version and only incremented it when we actually change how data is written to file that downgrade can be simpler.

lavacat · 2023-06-01T06:14:55Z

Version logic is a bit different between 3.4, 3.5 and 3.6.
In 3.4 version is first decided in decideClusterVersion based on version.Version and then saved to v2store. In Recover we rely only on version recorded in v2store. See clusterVersionFromStore. Version is also saved to backend cluster/clusterVersion but it's never read.

3.5 added clusterVersionFromBackend, but I think v2store path is still used by default. Also 3.5 added downgradeInfoFromBackend. I don't fully understand downgrade, but I think workflow is described here

3.6 is using ClusterVersionFromBackend by default. It also added meta/storageVersion key that's used in migrate.

lavacat · 2023-06-01T08:47:56Z

~~What do you think about adding a special flag to 3.4 to control version checks? See #15990 This will also allow rolling downgrade.~~

Another option is to snapshot using etcdctl 3.5, then stop the cluster and restore using etcdctl 3.4. Here are steps I've used to test this:
3.5 cluster

bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

snapshot

./bin/etcdctl snapshot save snap-3.5

stop all nodes, remove infra dirs and restore:

./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra1 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra2 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra3 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'

then start cluster using 3.4 binary:

bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

lavacat · 2023-06-01T09:09:14Z

@serathius saw your comment on PR. Duplicating my question here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using snapshot and it requires stopping all nodes.

I've also tried downgrade enable workflow, that still requires using snapshot but I was hoping there is no need to stop the cluster. It didn't work for me.

serathius · 2023-06-01T09:20:22Z

@serathius saw your comment on PR. Duplicating my #15990 (comment) here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using #15878 (comment) and it requires stopping all nodes.

This is exactly what we need to support downgrades. Remove the confState and term fields. This is also exactly what downgrade enable does in v3.6, but it also coordinates the change between members in live cluster. We don't want to backport the coordination logic.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness. You are right that etcd v3.4 will just start from v3.5 data. However, have you thought about what will happen with confState and term fields? Etcd v3.4 is unaware of those fields so they will remain unchanged and ignored, and then you decide to upgrade back to v3.5 and it goes BOOOM. Etcd v3.5 starts, find those fields, assumes they come from previous v3.5 run and tries to use outdated confState and term. See #13514

One thing we can add in v3.4 is a safeguard for those fields. Have etcd v3.4.27 reject db file if it finds fields from v3.5. It should make it clear to user that just loading data from v3.5 in v3.4 is unsupported and will break their cluster, maybe not immediately, but later.

lavacat · 2023-06-01T18:01:55Z

You are right that etcd v3.4 will just start from v3.5 data.

That's actually was my main problem, without restoring from snapshot, v3.4 will fail to start if you just point to 3.5 data dir.

I've added fields to migrate in this PR #15994

lavacat · 2023-06-02T07:14:59Z

@serathius, update PR #15994, I think it's ready for review. But I'd like to clarify couple things.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness.

v3.4 PR #15990 does this. See downgradeMetaBucket.
Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE.
The problem is that this PR adds "code smell".

Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using snapshot and I had to stop cluster. Am I missing something here? I can retest the procedure again.

serathius · 2023-06-02T08:56:59Z

cc @ahrtr @ptabor to get feedback about adding downgrade support.

serathius · 2023-06-02T09:00:54Z

Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE.
The problem is that this PR adds "code smell".

Don't understand the statement. What is the code smell you see?

Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using #15878 (comment) and I had to stop cluster. Am I missing something here? I can retest the procedure again.

We should make it work though, can you provide logs so I can understand the problem you are facing?

ahrtr · 2023-06-05T07:38:52Z

I am not sure whether we should support downgrading 3.5 to 3.4.

Public Cloud

EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm
Is AKS still using 3.4? cc @fuweid to double confrim
Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.

Private Cloud

Is OpenShift still using 3.4? cc @tjungblu to double confirm
TKG isn't using 3.4 anymore. All current TKG versions are using 3.5.

Non-K8s use cases?

Any feedback please?

Online and offline migration

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration. The offline approach is to backport & enhance the etcdutl migrate command to & in 3.5, as @serathius mentioned in #15878 (comment). But it seems that the etcdutl migrate implementation in main branch doesn't update ClusterClusterVersionKeyName and ClusterDowngradeKeyName when migrating from 3.6 to 3.5?

The high level workflow of online downgrading is,

lavacat · 2023-06-05T08:13:11Z

@serathius

Don't understand the statement. What is the code smell you see

Adding 3.5.0 capability and downgradeMetaBucket in mvcc seem like a hack. But maybe just my personal perception :)

Here is an example of error when starting 3.4 with 3.5 data-dir

$ bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

{"level":"fatal","ts":"2023-06-05T01:01:52.222568-0700","caller":"membership/cluster.go:795","msg":"invalid downgrade; server version is lower than determined cluster version","current-server-version":"3.4.26","determined-cluster-version":"3.5","stacktrace":"go.etcd.io/etcd/etcdserver/api/membership.mustDetectDowngrade\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:795\ngo.etcd.io/etcd/etcdserver/api/membership.(*RaftCluster).SetVersion\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:570\ngo.etcd.io/etcd/etcdserver.(*applierV2store).Put\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:97\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyV2Request\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:128\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2237\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).apply\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2178\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1412\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1136\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1072\ngo.etcd.io/etcd/pkg/schedule.(*fifo).run\n\t/Users/bk/github/etcd-release-3-5/pkg/schedule/schedule.go:157"}

to remove this error, we need remove mustDetectDowngrade
etcd v3.4 will start, but requests will fail with

$ ./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:07:07.103655-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ca000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable

That's because we are missing 3.5.0 capability

lavacat · 2023-06-05T08:45:23Z

@ahrtr

I am not sure whether we should support downgrading 3.5 to 3.4.

We have 3.4 build with the patch #15990 in case there is a need to do rollback during incident, but we never had to do it.
I think this is useful operationally and makes SREs happy, but if 3.4 is declared EOL, everyone will upgrade without the patch.

In terms of downgrade workflow I've tested using 3 node cluster and there are couple issues:

First call downgrade enable fails, but downgrade job is actually started. I'm using etcdctl downgrade built from main.

$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:28.807973-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000196780/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:31.260858-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: cluster has a downgrade job in progress"}
Error: etcdserver: cluster has a downgrade job in progress

After replacing 1st member binary, 2 other members fail with

{"level":"info","ts":"2023-06-05T01:21:14.489291-0700","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"fd422379fda50e48","from":"3.5","to":"3.4"}
{"level":"fatal","ts":"2023-06-05T01:21:14.489323-0700","caller":"membership/downgrade.go:59","msg":"invalid downgrade; server version is not allowed to join when downgrade is enabled","current-server-version":"3.5.9","target-cluster-version":"3.4.0","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/downgrade.go:59\ngo.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/cluster.go:593\ngo.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:101\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:135\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2228\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2151\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1384\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1199\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1122\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.5.9/schedule/schedule.go:157"}

After starting 2 failed members with 3.4 binary, I still get

./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:28:09.783384-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000240000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable

serathius · 2023-06-05T08:49:13Z

Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.

Yes, GKE is on v3.4. That's why Han is asking for downgrade support so they can feel safe to upgrade.

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration.

Don't agree. Online downgrade is totally broken in v3.4 and v3.5. The whole design was broken and fixing it would be to disrupt-full to backport. Making sure that downgrades v3.6 -> v3.5 works already will require a lot of qualification, we should not put more resources here.

What I'm proposing is just add support for offline so users avoid totally abandoning users and give them subpar, but working and tested path to rollback. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

We don't need nothing more then for etcdutl migrate to officially support v3.4

serathius · 2023-06-05T08:50:56Z

@lavacat Please follow thread in #11716 (comment) on how broken the etcdctl downgrade enable is on v3.5.

lavacat · 2023-06-05T09:23:07Z

What I'm proposing is just add support for offline so users avoid totally abandoning users and give team subpar but working and tested path to downgrade. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

I'm onboard with this. migrate with PR #15994 + using snapshot. No changes to 3.4.

@ahrtr ClusterClusterVersionKeyName in 3.4 is updated in SetVersion based on decided cluster version. see comment.
During testing, after snapshot is restored, but before member starts

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.5.0

after member starts

{"level":"info","ts":"2023-06-05T02:06:06.317983-0700","caller":"membership/cluster.go:547","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"91bc3c398fb3c146","from":"3.0","from":"3.4"}
{"level":"info","ts":"2023-06-05T02:06:06.318064-0700","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.4.0

ClusterDowngradeKeyName isn't present in 3.4. I can add it to migrate to be removed when 3.5->3.4.

tjungblu · 2023-06-05T10:10:06Z

Is OpenShift still using 3.4? cc @tjungblu to double confirm

Not with any currently supported version. Just to also give you some more data points here, to stay supported customers had to upgrade. So many thousand clusters successfully upgraded from 3.4 to 3.5 already, plus all our e2e test pipelines that were testing this for many ten-thousand runs previously.

I'm not aware of a single issue a customer had. The recommended downgrade procedure IIRC has been to restore the entire control plane with a snapshot from before the upgrade was kicked-off - but I don't think this was ever necessary.

chaochn47 · 2023-06-05T16:46:58Z

EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm

Yes. All the supported k8s version etcd clusters have upgraded to use 3.5.

From my understanding, to solve the upgrade failed triggers downgrade issue from k8s perspective.

decouple etcd upgrade and k8s upgrade, so even if k8s upgrade fails, it won't trigger etcd to downgrade. etcd stays at 3.5.
etcd supports downgrade from v3.5 to v3.4 with no downtime.

fuweid · 2023-06-07T06:25:37Z

Hi, @ahrtr. Sorry for late reply.

Is AKS still using 3.4?

Yes. And we are also using other versions depending on the cluster.

For this issue, it seems reasonable to me if we can have rollback solution with no downtime.

ahrtr · 2023-07-23T13:48:06Z

Thanks all for the feedback.

It seems that 3.4 is only used by minorities. A simple summary...

private cloud provider, neither TKG nor OpenShift is using 3.4 anymore. etcd 3.5.6+ has already been verified on (roughly) thousands of clusters in TKG. It has also been verified in OpenShift in lots of cluster as well as @tjungblu mentioned.
public cloud provider
- EKS: In all the supported k8s versions etcd have been upgraded to use 3.5.
- GKE: Indeed K8s 1.21 (etcd 3.4.x) is still available, but the default version in the Stable channel has already been upgraded to K8s 1.22.12 (should be etcd 3.5.4?) on September 02, 2022. [In theory, K8s 1.22.x is still working on top of etcd 3.4.x]
- AKS: It's still using etcd 3.4 based on feedback from @fuweid . But based on aks-kubernetes-release-calendar, AKS follows 12 months of support for a generally available (GA) Kubernetes version. So K8s 1.21 (etcd 3.4) should be already out of support.

Also backporting online downgrading from 3.5 to 3.4 also require huge effort, it also might introduce additional risk of regression in 3.5. We should try to avoid adding any new feature to 3.5.

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

logicalhan · 2023-07-23T13:54:56Z

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

I disagree, GKE does not and has not used 3.5 and they are a major cloud provider. Google's position is that the number of regressions in 3.5 has made upgrade to 3.5 unviable without a safe downgrade path. Therefore, my position is that it should indeed be prioritized.

serathius · 2023-07-24T09:16:00Z

I'm on side that this is just too much work and too risky. See the amount of work, all the tasks listed in #13168. Online is just much more complicated then offline supports, as offline can be done by any external binary like etcdutl, but online needs to be built in into etcd binary.

Compare amount of work. For offline downgrading etcd from v3.5 to v3.4, you can just pick the etcdutl for v3.6 without a problem. It's just one PR #15994, still we are working on it for almost a month now. Compare it to online supports that requires backporting multiple months of work.

jmhbnz · 2023-07-26T04:42:40Z

My view is that thanks to the uptake of etcd 3.5.6+ in platforms like EKS, OCP and TKG and elsewhere we can draw some confidence from the hundreds of thousands of clusters that have been running successfully for long periods of time now with these versions without issues.

So my preference fwiw is to avoid any pathway involving extensive backports to 3.4 and focus on solid offline downgrade procedure.

serathius · 2023-07-26T15:02:45Z

Talked with @logicalhan, I understand his argument that offline downgrade is not viable on large fleet of etcds. It would be a disaster recover level. Fact is that downgrades where implemented broken in v3.5 and it took a big redesign to fix them for v3.6. This however means that we have left a broken API in v3.5. Online downgrades in v3.6 were implemented as bare bones feature, there are still a lot of places the downgrade mechanism needs to be plugged into. Having v3.5->v3.4 online downgrade could help us finish the work.

I would be supportive of fixing online v3.5 -> v3.4 downgrades as:

Backports will be to v3.5 and not v3.4.
It will fix broken downgrade API in v3.5
It will allow us to properly test the downgrade mechanism before we v3.6.
It should not take much resources from etcd community as it will be fully funded by @logicalhan.

ahrtr · 2023-07-26T16:12:45Z

large fleet of etcds

I was thinking etcd 3.4 was only used by minority of K8s clusters for each cloud vendor, including private and public vendors, based on the feedback and my investigation. But it isn't the case for GKE based on the feedback from @logicalhan a couple of days back, the fact is ALL existing K8s versions in GKE are using etcd 3.4.x. I was shocked. It's already 2+ years since the release of 3.5.0, and also 1+ years since the community fixed all known data inconsistency issues.

it will be fully funded by @logicalhan.

I am curious how?

logicalhan · 2023-07-26T16:46:53Z

it will be fully funded by @logicalhan.

I am curious how?

We're hiring a person who will work on etcd (at least partially).

lavacat · 2023-07-26T19:58:03Z

Current version of PR works fine with the limitation that one has to use snapshot to downgrade or remove wal files. See #15994 (comment)
This means that downgrade will require cluster downtime and potential data loss of entries in wal that aren't in snapshot yet.

The problem is that version is recorded in WAL and it has to be removed from WAL. We don't have mechanism to do that. Adding this mechanism is possible, but increases complexity of this change.

@serathius @ahrtr
Do you both support adding wal manipulation as part of migrate command?
Is the PR still relevant without online downgrade?

For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest. I don't think this should be merged, but can be a tradeoff if you want to do your 3.4 -> 3.5 upgrade sooner.

ahrtr · 2023-07-27T14:56:29Z

For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest.

This seems to be the cheapest direction.

Downgrading 3.5 to 3.4 is a special case, we don't have to backport the complete downgrading feature to 3.5. It's risky to do that, and it will also complicate the 3.5 code base.

Proposed change for 3.4 (on top of @lavacat 's #15990)

Cleanup new fields added in 3.5. (excluding clusterVersion) on startup and on snapshot recovery, just as [WIP] *: support online downgrade from 3.5 to 3.4 #15990 does.
Add dummy support for new protocol added in 3.5 (e.g ClusterVersionSetRequest, ClusterMemberAttrSetRequest, DowngradeInfoSetRequest, AuthStatusRequest). Recognise them but ignore them. Note: NO ANY CHANGE/manipulation ON THE WAL FILES.

EDIT: We don't need to worry about ClusterVersionSetRequest, ClusterMemberAttrSetRequest, and DowngradeInfoSetRequest at all.

ClusterVersionSetRequest is only used by updateClusterVersionV3 (in 3.5), which isn't called at all in 3.5.
ClusterMemberAttrSetRequest is only used by publishV3, which again isn't called at all in 3.5.
DowngradeInfoSetRequest is supported by etcd 3.5, but there is no client side command. Downgrade isn't a completed feature in 3.5. So we don't need to worry about it for 3.5.

So We only need to take care of AuthStatusRequest in 3.4.

More references:

Impact on users (e.g. GKE)

If they want to benefit from this solution. They can't upgrade from old 3.4 to 3.5 directly. Instead, they must upgrade their clusters to a new 3.4.X version (which includes the change proposed above) in the first step, then upgrade to 3.5.x in the second step.

Do we still need #15994?

No, as long as previously the clusters was on a 3.4.x version with the change proposed above.

lavacat · 2023-07-28T21:16:21Z

@ahrtr in principle I agree with your approach. Making changes to 3.4 to support online downgrade seems more practical.

I don't mind throwing away #15994, but it might be cleaner to perform backend migrate instead of dealing with term and confState in 3.4. This way we also use new migrate framework.

Then in 3.4 we can have a flag --experimental-downgrade-3-5 that allows 3.4 to start within 3.5 cluster:

adds 3.5 capability https://github.com/etcd-io/etcd/pull/15990/files#diff-8c373ed6659c31f66e9815a4a17b60d32705f1d4f3f0d075b6c6f057df093390
disables mustDetectDowngrade
ignores AuthStatusRequest

Let's discuss during next community meeting, so everyone is in agreement on next steps. If there is more information/POC needed, let me know, I'll try to compose everything before the meeting.

ahrtr · 2023-08-15T15:41:08Z

As discussed in previous community meeting, the offline downgrade tool isn't the point. The point is [whether or not] or how to support online downgrade from 3.5 to 3.4.

Usually it's common to make new version (e.g. 3.6) to be backward compatible with old version (e.g. 3.5), and it's exactly the principle what the existing downgrade feature follows. For example, when downgrading from 3.6 to 3.5, the etcd 3.6 instance should migrate the data to be 3.5 compatible.

But the online downgrade is a big & complicated feature, it isn't feasible & safe to backport the complete feature from 3.6 to 3.5.

Instead, we can treat the online downgrade from 3.5 to 3.4 as a special case. I think we can just spend minor or moderate effort to make the old version (3.4) to be forward compatible with the version (3.5). Specifically, we just need to ensure the 3.4 binary can run on the data generated by 3.5 binary, roughly just as I mentioned above #15878 (comment).

siyuanfoundation · 2024-01-10T17:20:20Z

I have written a design doc regarding the path forward. Please take a look and provide feedbacks, thanks!

cc @ahrtr @lavacat @serathius @logicalhan @fuweid

siyuanfoundation · 2024-01-25T19:48:57Z

Tracking work

[optional] changes in etcdutil migrate command to clean up data fields in 3.5 for 3.4: server: add fields to migrate from 3.5 to 3.4 #15994
changes in 3.4 to allow for 3.5->3.4 downgrade: [3.4] allow downgrade from 3.5 #17330
handle AuthStatusRequest in 3.4: [3.4] allow downgrade from 3.5 #17330
add robustness tests for the 3.5->3.4 and 3.5->3.4->3.5 scenarios.
document downgrade process: Add instructions for 3.5->3.4 downgrade. website#847
update Kubernetes migrate tool to reflect 3.5 -> 3.4 downgrade process.

logicalhan added the type/feature label May 11, 2023

serathius added the help wanted label May 11, 2023

lavacat self-assigned this May 11, 2023

lavacat mentioned this issue Jun 1, 2023

[WIP] *: support online downgrade from 3.5 to 3.4 #15990

Draft

lavacat linked a pull request Jun 1, 2023 that will close this issue

server: add fields to migrate from 3.5 to 3.4 #15994

Open

ahrtr mentioned this issue Jul 23, 2023

Documentation: add roadmap #16279

Merged

neolit123 mentioned this issue Nov 15, 2023

Control plane failure modes for high-availability documentation kubernetes/website#43849

Open

siyuanfoundation mentioned this issue Jan 27, 2024

[3.4] allow downgrade from 3.5 #17330

Merged

siyuanfoundation mentioned this issue Mar 5, 2024

[3.5] backport mix version e2e test. #17531

Merged

siyuanfoundation removed the help wanted label Mar 28, 2024

siyuanfoundation mentioned this issue Mar 28, 2024

[3.4] Allow new server to join higher cluster version if NextClusterVersionCompatible is true #17665

Merged

siyuanfoundation mentioned this issue May 3, 2024

[3.5] Backport cluster downgrade test. #17931

Merged

siyuanfoundation mentioned this issue May 13, 2024

Add instructions for 3.5->3.4 downgrade. etcd-io/website#847

Merged

Downgrade support from 3.5 to 3.4 #15878

Downgrade support from 3.5 to 3.4 #15878

Comments

logicalhan commented May 11, 2023

What would you like to be added?

Why is this needed?

serathius commented May 11, 2023

lavacat commented May 11, 2023

serathius commented May 11, 2023 • edited Loading

lavacat commented May 27, 2023

lavacat commented May 30, 2023

jpbetz commented Jun 1, 2023

lavacat commented Jun 1, 2023 • edited Loading

lavacat commented Jun 1, 2023 • edited Loading

lavacat commented Jun 1, 2023

serathius commented Jun 1, 2023 • edited Loading

lavacat commented Jun 1, 2023

lavacat commented Jun 2, 2023

serathius commented Jun 2, 2023

serathius commented Jun 2, 2023

ahrtr commented Jun 5, 2023

Public Cloud

Private Cloud

Non-K8s use cases?

Online and offline migration

lavacat commented Jun 5, 2023

lavacat commented Jun 5, 2023

serathius commented Jun 5, 2023 • edited Loading

serathius commented Jun 5, 2023

lavacat commented Jun 5, 2023

tjungblu commented Jun 5, 2023

chaochn47 commented Jun 5, 2023 • edited Loading

fuweid commented Jun 7, 2023

ahrtr commented Jul 23, 2023

logicalhan commented Jul 23, 2023

serathius commented Jul 24, 2023 • edited Loading

jmhbnz commented Jul 26, 2023

serathius commented Jul 26, 2023

ahrtr commented Jul 26, 2023

logicalhan commented Jul 26, 2023

lavacat commented Jul 26, 2023 • edited Loading

ahrtr commented Jul 27, 2023 • edited Loading

Proposed change for 3.4 (on top of @lavacat 's #15990)

Impact on users (e.g. GKE)

Do we still need #15994?

lavacat commented Jul 28, 2023

ahrtr commented Aug 15, 2023

siyuanfoundation commented Jan 10, 2024

siyuanfoundation commented Jan 25, 2024 • edited by jmhbnz Loading

serathius commented May 11, 2023 •

edited

Loading

lavacat commented Jun 1, 2023 •

edited

Loading

lavacat commented Jun 1, 2023 •

edited

Loading

serathius commented Jun 1, 2023 •

edited

Loading

serathius commented Jun 5, 2023 •

edited

Loading

chaochn47 commented Jun 5, 2023 •

edited

Loading

serathius commented Jul 24, 2023 •

edited

Loading

lavacat commented Jul 26, 2023 •

edited

Loading

ahrtr commented Jul 27, 2023 •

edited

Loading

siyuanfoundation commented Jan 25, 2024 •

edited by jmhbnz

Loading