etcd operator #222

hexfusion · 2020-02-20T17:14:09Z

https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md

Perf

problems bootstrapping cluster @alaypatel07

Metal

ipv4 works on 3/4 (asked in slack channel)
ipv6 works on 3/4 (asked in slack channel)

IPv6

azure is broken on pivot @deads2k
1. bug 1804913: make ipv6 support a little better cluster-etcd-operator#173
azure bootstrap is broken @hexfusion
- Bug 1804960: Revert: render: fix listening addresses in IPv6 case cluster-etcd-operator#154
- pkg/cmd/render: populate bootstrapIP from local interfaces cluster-etcd-operator#175
- stop using invalid bootstrapIP from installer bug 1807169: use localhost for bootstrap IP until bootkube is fixed installer#3175 @deads2k
- use correct bootstrapIP in static pods bug 1806723: use correct IP for bootstrap host env vars cluster-etcd-operator#200 @deads2k

bootstrapping

keep bootstrap kube-apiserver up @deads2k
1. bug 1805421: allow bootstrap server to terminate before removing etcd-bootstrap fr… cluster-etcd-operator#179
real kube-apiserver to only use real etcd @deads2k
1. bug 1805421: skip using bootstrap etcd cluster-kube-apiserver-operator#772

shutdown clusters don't start back up

need to bypass the member lookup when there is no cluster
- create a golang command so we can follow the logic. @deads2k bug 1805807: create logic for golang ETCD_INITIAL_CLUSTER etcd#28
- use golang command in our static pods @deads2k bug 1805807: call the discovery-etcd-initial-cluster cluster-etcd-operator#185
- update golang command to have a memory. - this isn't needed because we can use presence of data-dir to skip. Thanks @alaypatel07 and @retroflexer
- Bug 1807054: fix recovery of master nodes restarting at the same time cluster-etcd-operator#203 @alaypatel07

blocker bugs

port conflict @p0lyn0mial https://bugzilla.redhat.com/show_bug.cgi?id=1806579 - Bug 1806579: waits for ports before starting etcd member cluster-etcd-operator#226
this is 4.2 to 4.3. Not part of this feature. cert problem found by wking @sttts is finding someone
oc rsh doesn't auto-start with work etcdctl. this is needed for maintenance. - @sanchezl https://bugzilla.redhat.com/show_bug.cgi?id=1805981 Bug 1805981: oc rsh to an etcd pod needs to have etcdctl "just work" cluster-etcd-operator#207
upgrade failures where etcd appears to be unavailable @sanchezl https://bugzilla.redhat.com/show_bug.cgi?id=1807194

Tests we think should work

All nodes being shut off at the same time and restarted. - tested by QE and appears to be working.
IP address change of a single member @hexfusion
debugging and detection when DNS information for one member is lost @sanchezl Bug 1809732: cluster-etcd-operator: upgrades show "etcd member is unavailable" cluster-etcd-operator#225. It degrades nicely, but we don't like degrading during upgrade. May end up switching to IPs
Addition of a new member when there is significant etcd data. - verified by @alaypatel07 by adding a fourth member.
Upgrade, downgrade, re-upgrade @alaypatel07

Tests we think should fail

The text was updated successfully, but these errors were encountered:

retroflexer · 2020-02-20T17:38:45Z

I am working on:

Restoring a cluster to previous state
etcd-quorum recovery steps
Removal of a member from etcd cluster
Recovery of a member with a bad data-dir

retroflexer · 2020-02-21T15:37:21Z

@deads2k I am not suresh gaikwad. My handle here is @retroflexer.

openshift-bot · 2020-10-04T05:58:06Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-11-03T07:45:49Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-12-03T09:40:36Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-12-03T09:40:57Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

deads2k changed the title ~~cluster-etcd-operator: 4.4 outstanding tasks~~ etcd operator Feb 20, 2020

deads2k added the stage/stable label Feb 20, 2020

deads2k added this to the v4.4 milestone Feb 20, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 3, 2020

openshift-ci-robot closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd operator #222

etcd operator #222

hexfusion commented Feb 20, 2020 •

edited

Loading

retroflexer commented Feb 20, 2020

retroflexer commented Feb 21, 2020

openshift-bot commented Oct 4, 2020

openshift-bot commented Nov 3, 2020

openshift-bot commented Dec 3, 2020

openshift-ci-robot commented Dec 3, 2020

etcd operator #222

etcd operator #222

Comments

hexfusion commented Feb 20, 2020 • edited Loading

retroflexer commented Feb 20, 2020

retroflexer commented Feb 21, 2020

openshift-bot commented Oct 4, 2020

openshift-bot commented Nov 3, 2020

openshift-bot commented Dec 3, 2020

openshift-ci-robot commented Dec 3, 2020

hexfusion commented Feb 20, 2020 •

edited

Loading