[release-3.5] fix the potential data loss for clusters with only one member #14424

serathius · 2022-09-05T12:29:55Z

Backport #14400 to v3.5 fixing #14370

Benchmarking showed 4.5% performance degradation.

Command

./bin/etcd --quota-backend-bytes=4300000000
bin/tools/benchmark txn-put --endpoints="http://127.0.0.1:2379" --clients=200 --conns=200 --key-space-size=4000000000 --key-size=128 --val-size=10240  --total=200000 --rate=40000

Before

Summary:
  Total:	23.8374 secs.
  Slowest:	0.1226 secs.
  Fastest:	0.0010 secs.
  Average:	0.0235 secs.
  Stddev:	0.0175 secs.
  Requests/sec:	8390.1908

Response time histogram:
  0.0010 [1]	|
  0.0132 [43566]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0253 [115481]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0375 [8083]	|∎∎
  0.0497 [2461]	|
  0.0618 [17602]	|∎∎∎∎∎∎
  0.0740 [10646]	|∎∎∎
  0.0861 [1887]	|
  0.0983 [73]	|
  0.1104 [0]	|
  0.1226 [200]	|

Latency distribution:
  10% in 0.0109 secs.
  25% in 0.0136 secs.
  50% in 0.0167 secs.
  75% in 0.0222 secs.
  90% in 0.0579 secs.
  95% in 0.0635 secs.
  99% in 0.0754 secs.
  99.9% in 0.1119 secs.

After

Summary:
  Total:	24.9543 secs.
  Slowest:	0.2118 secs.
  Fastest:	0.0054 secs.
  Average:	0.0246 secs.
  Stddev:	0.0174 secs.
  Requests/sec:	8014.6536

Response time histogram:
  0.0054 [1]	|
  0.0260 [155504]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0467 [11457]	|∎∎
  0.0673 [30294]	|∎∎∎∎∎∎∎
  0.0880 [2344]	|
  0.1086 [0]	|
  0.1292 [0]	|
  0.1499 [0]	|
  0.1705 [58]	|
  0.1912 [142]	|
  0.2118 [200]	|

Latency distribution:
  10% in 0.0129 secs.
  25% in 0.0146 secs.
  50% in 0.0176 secs.
  75% in 0.0241 secs.
  90% in 0.0561 secs.
  95% in 0.0605 secs.
  99% in 0.0691 secs.
  99.9% in 0.1951 secs.

For a cluster with only one member, the raft always send identical unstable entries and committed entries to etcdserver, and etcd responds to the client once it finishes (actually partially) the applying workflow. When the client receives the response, it doesn't mean etcd has already successfully saved the data, including BoltDB and WAL, because: 1. etcd commits the boltDB transaction periodically instead of on each request; 2. etcd saves WAL entries in parallel with applying the committed entries. Accordingly, it may run into a situation of data loss when the etcd crashes immediately after responding to the client and before the boltDB and WAL successfully save the data to disk. Note that this issue can only happen for clusters with only one member. For clusters with multiple members, it isn't an issue, because etcd will not commit & apply the data before it being replicated to majority members. When the client receives the response, it means the data must have been applied. It further means the data must have been committed. Note: for clusters with multiple members, the raft will never send identical unstable entries and committed entries to etcdserver. Signed-off-by: Benjamin Wang <wachao@vmware.com>

serathius · 2022-09-05T12:30:10Z

cc @ptabor @ahrtr @tbg

spzala

Thanks @serathius @ahrtr

ahrtr

LGTM

Thanks @serathius

One comment on the performance test... It would be better to provide the info of the test environment (i.e. CPU, memory and disk), because I believe different environment would have different result.

serathius · 2022-09-06T07:15:37Z

Motivation for benchmark was to verify that performance regression is within expected range (~5% based on your tests). Even if I provided environment I don't expect that could be used to reproduce the results as I don't expect most people to have same CPU/Mem setup.

However we are right about different environments providing different results, for public announcements we should try to diversify our testing environments to have more accurate results. Instead of fixed number 4.5% we could provide a range (for example) 3-6% which could be reflect best and worst case scenarios we seen in different environments.

tbg approved these changes Sep 5, 2022

View reviewed changes

spzala approved these changes Sep 5, 2022

View reviewed changes

ahrtr approved these changes Sep 5, 2022

View reviewed changes

ahrtr merged commit 747bf5c into etcd-io:release-3.5 Sep 5, 2022

ahrtr mentioned this pull request Sep 6, 2022

Durability API guarantee broken in single node cluster #14370

Closed

This was referenced Sep 19, 2022

[etcd] add 3.5.5 to all kubernetes releases kubernetes-sigs/kubespray#9290

Closed

[etcd] add 3.5.5 to all kubernetes releases kubernetes-sigs/kubespray#9292

Closed

serathius deleted the one_member_data_loss_raft_3_5 branch June 15, 2023 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-3.5] fix the potential data loss for clusters with only one member #14424

[release-3.5] fix the potential data loss for clusters with only one member #14424

serathius commented Sep 5, 2022

serathius commented Sep 5, 2022

spzala left a comment

ahrtr left a comment

serathius commented Sep 6, 2022

[release-3.5] fix the potential data loss for clusters with only one member #14424

[release-3.5] fix the potential data loss for clusters with only one member #14424

Conversation

serathius commented Sep 5, 2022

serathius commented Sep 5, 2022

spzala left a comment

Choose a reason for hiding this comment

ahrtr left a comment

Choose a reason for hiding this comment

serathius commented Sep 6, 2022