Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky robustness test #17052

Closed
ahrtr opened this issue Dec 1, 2023 · 7 comments · Fixed by #17056
Closed

Flaky robustness test #17052

ahrtr opened this issue Dec 1, 2023 · 7 comments · Fixed by #17056

Comments

@ahrtr
Copy link
Member

ahrtr commented Dec 1, 2023

Which github workflows are flaking?

https://github.com/etcd-io/etcd/actions/runs/7061287598/job/19229219034?pr=17050

etcd somehow failed during boostraping, and the WAL directory wasn't created at all, so the verification failed with reason "no such file or directory"

2023-12-01T17:58:04.8181296Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.802824Z","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["/home/runner/work/etcd/etcd/bin/etcd","--name=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=https://localhost:20001","--initial-advertise-peer-urls=https://localhost:20003","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","--snapshot-count=1000","--peer-cert-file","/home/runner/work/etcd/etcd/tests/fixtures/server.crt","--peer-key-file","/home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure","--peer-trusted-ca-file","/home/runner/work/etcd/etcd/tests/fixtures/ca.crt","--experimental-snapshot-catchup-entries=100","--initial-cluster-token=new","--heartbeat-interval=101","--election-timeout=521","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--experimental-watch-progress-notify-interval=100ms","--initial-cluster=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2=https://localhost:20013","--initial-cluster-state=existing"]}
2023-12-01T17:58:04.8190916Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.802871Z","caller":"etcdmain/etcd.go:123","msg":"Initialize and start etcd server","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","dir-type":"empty"}
2023-12-01T17:58:04.8193747Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"warn","ts":"2023-12-01T17:58:04.8029Z","caller":"embed/config.go:893","msg":"Running http and grpc server on single port. This is not recommended for production."}
2023-12-01T17:58:04.8196468Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.802917Z","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["https://localhost:20001"]}
2023-12-01T17:58:04.8200647Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.802948Z","caller":"embed/etcd.go:510","msg":"starting with peer TLS","tls-info":"cert = /home/runner/work/etcd/etcd/tests/fixtures/server.crt, key = /home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure, client-cert=, client-key=, trusted-ca = /home/runner/work/etcd/etcd/tests/fixtures/ca.crt, client-cert-auth = false, crl-file = ","cipher-suites":[]}
2023-12-01T17:58:04.8206158Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.804256Z","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
2023-12-01T17:58:04.8242131Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.804459Z","caller":"embed/etcd.go:311","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"f8d24d7","go-version":"go1.21.4","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["https://localhost:20003"],"listen-peer-urls":["https://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2=https://localhost:20013","initial-cluster-state":"existing","initial-cluster-token":"new","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
2023-12-01T17:58:04.8255169Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.806031Z","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/snap/db","took":"695.92µs"}
2023-12-01T17:58:04.8259575Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.813347Z","caller":"embed/etcd.go:393","msg":"closing etcd server","name":"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","advertise-peer-urls":["https://localhost:20003"],"advertise-client-urls":["http://localhost:20000"]}
2023-12-01T17:58:04.8266129Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.813421Z","caller":"embed/etcd.go:395","msg":"closed etcd server","name":"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","advertise-peer-urls":["https://localhost:20003"],"advertise-client-urls":["http://localhost:20000"]}
2023-12-01T17:58:04.8271522Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.813456Z","caller":"verify/verify.go:63","msg":"verification of persisted state","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001"}
2023-12-01T17:58:04.8302165Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"error","ts":"2023-12-01T17:58:04.822619Z","caller":"verify/verify.go:66","msg":"verification of persisted state failed","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","error":"[readWALNames] fileutil.ReadDir failed: open /tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/wal: no such file or directory","stacktrace":"go.etcd.io/etcd/server/v3/verify.Verify.func1\n\tgo.etcd.io/etcd/server/v3/verify/verify.go:66\ngo.etcd.io/etcd/server/v3/verify.Verify\n\tgo.etcd.io/etcd/server/v3/verify/verify.go:83\ngo.etcd.io/etcd/server/v3/verify.VerifyIfEnabled\n\tgo.etcd.io/etcd/server/v3/verify/verify.go:96\ngo.etcd.io/etcd/server/v3/verify.MustVerifyIfEnabled\n\tgo.etcd.io/etcd/server/v3/verify/verify.go:105\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:396\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:456\ngo.etcd.io/etcd/server/v3/embed.StartEtcd.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:114\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:251\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:205\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:128\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
2023-12-01T17:58:04.8321109Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"fatal","ts":"2023-12-01T17:58:04.822708Z","caller":"verify/verify.go:106","msg":"Verification failed","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","error":"[readWALNames] fileutil.ReadDir failed: open /tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/wal: no such file or directory","stacktrace":"go.etcd.io/etcd/server/v3/verify.MustVerifyIfEnabled\n\tgo.etcd.io/etcd/server/v3/verify/verify.go:106\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:396\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:456\ngo.etcd.io/etcd/server/v3/embed.StartEtcd.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:114\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:251\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:205\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:128\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
2023-12-01T17:58:05.0024231Z     logger.go:130: 2023-12-01T17:58:04.828Z        INFO    Failed to trigger failpoint     {"failpoint": "MemberReplace", "error": "match not found.  Set EXPECT_DEBUG for more info Errs: [unexpected exit code [1] after running [/home/runner/work/etcd/etcd/bin/etcd --name=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0 --listen-client-urls=http://localhost:20000 --advertise-client-urls=http://localhost:20000 --listen-peer-urls=https://localhost:20001 --initial-advertise-peer-urls=https://localhost:20003 --initial-cluster-token=new --data-dir /tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001 --snapshot-count=1000 --peer-cert-file /home/runner/work/etcd/etcd/tests/fixtures/server.crt --peer-key-file /home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure --peer-trusted-ca-file /home/runner/work/etcd/etcd/tests/fixtures/ca.crt --experimental-snapshot-catchup-entries=100 --initial-cluster-token=new --heartbeat-interval=101 --election-timeout=521 --snapshot-count=1000 --experimental-compaction-batch-limit=100 --experimental-watch-progress-notify-interval=100ms --initial-cluster=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2=https://localhost:20013 --initial-cluster-state=existing]], last lines:\n{\"level\":\"warn\",\"ts\":\"2023-12-01T17:58:04.802617Z\",\"caller\":\"flags/flag.go:93\",\"msg\":\"unrecognized environment variable\",\"environment-variable\":\"ETCD_VERIFY=all\"}\r\n{\"level\":\"warn\",\"ts\":\"2023-12-01T17:58:04.802729Z\",\"caller\":\"flags/flag.go:93\",\"msg\":\"unrecognized environment variable\",\"environment-variable\":\"ETCD_BRANCH=main\"}\r\n{\"level\":\"warn\",\"ts\":\"2023-12-01T17:58:04.802745Z\",\"caller\":\"flags/flag.go:93\",\"msg\":\"unrecognized environment variable\",\"environment-variable\":\"ETCD_UNSUPPORTED_ARCH=amd64\"}\r\n{\"level\":\"warn\",\"ts\":\"2023-12-01T17:58:04.802805Z\",\"caller\":\"embed/config.go:893\",\"msg\":\"Running http and grpc server on single port. This is not recommended for production.\"}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.802824Z\",\"caller\":\"etcdmain/etcd.go:63\",\"msg\":\"Running: \",\"args\":[\"/home/runner/work/etcd/etcd/bin/etcd\",\"--name=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0\",\"--listen-client-urls=http://localhost:20000\",\"--advertise-client-urls=http://localhost:20000\",\"--listen-peer-urls=https://localhost:20001\",\"--initial-advertise-peer-urls=https://localhost:20003\",\"--initial-cluster-token=new\",\"--data-dir\",\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"--snapshot-count=1000\",\"--peer-cert-file\",\"/home/runner/work/etcd/etcd/tests/fixtures/server.crt\",\"--peer-key-file\",\"/home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure\",\"--peer-trusted-ca-file\",\"/home/runner/work/etcd/etcd/tests/fixtures/ca.crt\",\"--experimental-snapshot-catchup-entries=100\",\"--initial-cluster-token=new\",\"--heartbeat-interval=101\",\"--election-timeout=521\",\"--snapshot-count=1000\",\"--experimental-compaction-batch-limit=100\",\"--experimental-watch-progress-notify-interval=100ms\",\"--initial-cluster=TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2=https://localhost:20013\",\"--initial-cluster-state=existing\"]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.802871Z\",\"caller\":\"etcdmain/etcd.go:123\",\"msg\":\"Initialize and start etcd server\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"dir-type\":\"empty\"}\r\n{\"level\":\"warn\",\"ts\":\"2023-12-01T17:58:04.8029Z\",\"caller\":\"embed/config.go:893\",\"msg\":\"Running http and grpc server on single port. This is not recommended for production.\"}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.802917Z\",\"caller\":\"embed/etcd.go:125\",\"msg\":\"configuring peer listeners\",\"listen-peer-urls\":[\"https://localhost:20001\"]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.802948Z\",\"caller\":\"embed/etcd.go:510\",\"msg\":\"starting with peer TLS\",\"tls-info\":\"cert = /home/runner/work/etcd/etcd/tests/fixtures/server.crt, key = /home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure, client-cert=, client-key=, trusted-ca = /home/runner/work/etcd/etcd/tests/fixtures/ca.crt, client-cert-auth = false, crl-file = \",\"cipher-suites\":[]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.804256Z\",\"caller\":\"embed/etcd.go:133\",\"msg\":\"configuring client listeners\",\"listen-client-urls\":[\"http://localhost:20000\"]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.804459Z\",\"caller\":\"embed/etcd.go:311\",\"msg\":\"starting an etcd server\",\"etcd-version\":\"3.6.0-alpha.0\",\"git-sha\":\"f8d24d7\",\"go-version\":\"go1.21.4\",\"go-os\":\"linux\",\"go-arch\":\"amd64\",\"max-cpu-set\":8,\"max-cpu-available\":8,\"member-initialized\":false,\"name\":\"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"wal-dir\":\"\",\"wal-dir-dedicated\":\"\",\"member-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member\",\"force-new-cluster\":false,\"heartbeat-interval\":\"101ms\",\"election-timeout\":\"505ms\",\"initial-election-tick-advance\":true,\"snapshot-count\":1000,\"max-wals\":5,\"max-snapshots\":5,\"snapshot-catchup-entries\":100,\"initial-advertise-peer-urls\":[\"https://localhost:20003\"],\"listen-peer-urls\":[\"https://localhost:20001\"],\"advertise-client-urls\":[\"http://localhost:20000\"],\"listen-client-urls\":[\"http://localhost:20000\"],\"listen-metrics-urls\":[],\"cors\":[\"*\"],\"host-whitelist\":[\"*\"],\"initial-cluster\":\"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0=https://localhost:20003,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-1=https://localhost:20008,TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2=https://localhost:20013\",\"initial-cluster-state\":\"existing\",\"initial-cluster-token\":\"new\",\"quota-backend-bytes\":2147483648,\"max-request-bytes\":1572864,\"max-concurrent-streams\":4294967295,\"pre-vote\":true,\"initial-corrupt-check\":false,\"corrupt-check-time-interval\":\"0s\",\"compact-check-time-enabled\":false,\"compact-check-time-interval\":\"1m0s\",\"auto-compaction-mode\":\"periodic\",\"auto-compaction-retention\":\"0s\",\"auto-compaction-interval\":\"0s\",\"discovery-url\":\"\",\"discovery-proxy\":\"\",\"discovery-token\":\"\",\"discovery-endpoints\":\"\",\"discovery-dial-timeout\":\"2s\",\"discovery-request-timeout\":\"5s\",\"discovery-keepalive-time\":\"2s\",\"discovery-keepalive-timeout\":\"6s\",\"discovery-insecure-transport\":true,\"discovery-insecure-skip-tls-verify\":false,\"discovery-cert\":\"\",\"discovery-key\":\"\",\"discovery-cacert\":\"\",\"discovery-user\":\"\",\"downgrade-check-interval\":\"5s\",\"max-learners\":1}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.806031Z\",\"caller\":\"storage/backend.go:80\",\"msg\":\"opened backend db\",\"path\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/snap/db\",\"took\":\"695.92µs\"}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.813347Z\",\"caller\":\"embed/etcd.go:393\",\"msg\":\"closing etcd server\",\"name\":\"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"advertise-peer-urls\":[\"https://localhost:20003\"],\"advertise-client-urls\":[\"http://localhost:20000\"]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.813421Z\",\"caller\":\"embed/etcd.go:395\",\"msg\":\"closed etcd server\",\"name\":\"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"advertise-peer-urls\":[\"https://localhost:20003\"],\"advertise-client-urls\":[\"http://localhost:20000\"]}\r\n{\"level\":\"info\",\"ts\":\"2023-12-01T17:58:04.813456Z\",\"caller\":\"verify/verify.go:63\",\"msg\":\"verification of persisted state\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\"}\r\n{\"level\":\"error\",\"ts\":\"2023-12-01T17:58:04.822619Z\",\"caller\":\"verify/verify.go:66\",\"msg\":\"verification of persisted state failed\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"error\":\"[readWALNames] fileutil.ReadDir failed: open /tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/wal: no such file or directory\",\"stacktrace\":\"go.etcd.io/etcd/server/v3/verify.Verify.func1\\n\\tgo.etcd.io/etcd/server/v3/verify/verify.go:66\\ngo.etcd.io/etcd/server/v3/verify.Verify\\n\\tgo.etcd.io/etcd/server/v3/verify/verify.go:83\\ngo.etcd.io/etcd/server/v3/verify.VerifyIfEnabled\\n\\tgo.etcd.io/etcd/server/v3/verify/verify.go:96\\ngo.etcd.io/etcd/server/v3/verify.MustVerifyIfEnabled\\n\\tgo.etcd.io/etcd/server/v3/verify/verify.go:105\\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close.func1\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:396\\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:456\\ngo.etcd.io/etcd/server/v3/embed.StartEtcd.func1\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:114\\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:251\\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:205\\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:128\\ngo.etcd.io/etcd/server/v3/etcdmain.Main\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\\nmain.main\\n\\tgo.etcd.io/etcd/server/v3/main.go:31\\nruntime.main\\n\\truntime/proc.go:267\"}\r\n{\"level\":\"fatal\",\"ts\":\"2023-12-01T17:58:04.822708Z\",\"caller\":\"verify/verify.go:106\",\"msg\":\"Verification failed\",\"data-dir\":\"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001\",\"error\":\"[readWALNames] fileutil.ReadDir failed: open /tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/wal: no such file or directory\",\"stacktrace\":\"go.etcd.io/etcd/server/v3/verify.MustVerifyIfEnabled\\n\\tgo.etcd.io/etcd/server/v3/verify/verify.go:106\\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close.func1\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:396\\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).Close\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:456\\ngo.etcd.io/etcd/server/v3/embed.StartEtcd.func1\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:114\\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\\n\\tgo.etcd.io/etcd/server/v3/embed/etcd.go:251\\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:205\\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:128\\ngo.etcd.io/etcd/server/v3/etcdmain.Main\\n\\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\\nmain.main\\n\\tgo.etcd.io/etcd/server/v3/main.go:31\\nruntime.main\\n\\truntime/proc.go:267\"}\r\n"}
2023-12-01T17:58:05.0086281Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2) (18290): {"level":"warn","ts":"2023-12-01T17:58:04.981824Z","caller":"rafthttp/peer.go:254","msg":"dropped internal Raft message since sending buffer is full","message-type":"MsgHeartbeat","local-member-id":"bf19ae4419db00dc","from":"bf19ae4419db00dc","remote-peer-id":"7002f235aa1e4b62","remote-peer-name":"pipeline","remote-peer-active":false}
2023-12-01T17:58:05.0091516Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-2) (18290): {"level":"warn","ts":"2023-12-01T17:58:04.986425Z","caller":"rafthttp/peer.go:254","msg":"dropped internal Raft message since sending buffer is full","message-type":"MsgHeartbeat","local-member-id":"bf19ae4419db00dc","from":"bf19ae4419db00dc","remote-peer-id":"7002f235aa1e4b62","remote-peer-name":"pipeline","remote-peer-active":false}

Before the etcd instance closed, the last log seen was "opened backend db".

2023-12-01T17:58:04.8255169Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.806031Z","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001/member/snap/db","took":"695.92µs"}
2023-12-01T17:58:04.8259575Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0) (18330): {"level":"info","ts":"2023-12-01T17:58:04.813347Z","caller":"embed/etcd.go:393","msg":"closing etcd server","name":"TestRobustnessKubernetesHighTrafficClusterOfSize3-test-0","data-dir":"/tmp/TestRobustnessKubernetesHighTrafficClusterOfSize31653420488/001","advertise-peer-urls":["https://localhost:20003"],"advertise-client-urls":["http://localhost:20000"]}

Which tests are flaking?

Robustness test

Github Action link

No response

Reason for failure (if possible)

No response

Anything else we need to know?

No response

@serathius
Copy link
Member

Looks like some recent issues with MemberReplace failpoint. Will look into bisecting recent changes.

@serathius
Copy link
Member

Looking on robustness nightly failures for main branch the issue happened between 054b677...bf34724 commits

@serathius
Copy link
Member

Local failure:

{"level":"panic","ts":"2023-12-03T16:33:31.459054+0100","caller":"membership/cluster.go:488","msg":"failed to update; member unknown","cluster-id":"b3bc0c1919fe5d7e","local-member-id":"bf19ae4419db00dc","unknown-remote-peer-id":"fab72fa6664a8ff9","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).UpdateAttributes\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/cluster.go:488\ngo.etcd.io/etcd/server/v3/etcdserver/apply.(*applierMembership).ClusterMemberAttrSet\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply/apply.go:430\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyInternalRaftRequest\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1934\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1878\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1793\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1106\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:894\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:821\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:187"}

@serathius
Copy link
Member

serathius commented Dec 3, 2023

4fe46f9 is the culprit at least based on bisection.

EDIT: Fixed commit id

@ahrtr
Copy link
Member Author

ahrtr commented Dec 3, 2023

bab9bfd6d296e2ad3dce778ff7f57e45b46a208c is the culprit at least based on bisection.

Did not find the commit ID.

@serathius
Copy link
Member

Sent fix #17056

@serathius
Copy link
Member

serathius commented Dec 4, 2023

We got a green run
image
but preceded by one more flake.

Failure happens in MemberReplace failpoint. New member fails to connect to other ones.

2023-12-04T10:28:17.1619188Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessEtcdHighTrafficClusterOfSize3-test-2) (52506): {"level":"warn","ts":"2023-12-04T10:28:17.064014Z","caller":"etcdserver/server.go:1713","msg":"failed to publish local member to cluster through raft","local-member-id":"5756ce507b9fcbe","local-member-attributes":"{Name:TestRobustnessEtcdHighTrafficClusterOfSize3-test-2 ClientURLs:[http://localhost:20010]}","publish-timeout":"9s","error":"etcdserver: request timed out, possibly due to connection lost"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants