Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-3.5] fix the potential data loss for clusters with only one member #14424

Merged
merged 1 commit into from
Sep 5, 2022

Conversation

serathius
Copy link
Member

Backport #14400 to v3.5 fixing #14370

Benchmarking showed 4.5% performance degradation.

Command

./bin/etcd --quota-backend-bytes=4300000000
bin/tools/benchmark txn-put --endpoints="http://127.0.0.1:2379" --clients=200 --conns=200 --key-space-size=4000000000 --key-size=128 --val-size=10240  --total=200000 --rate=40000

Before

Summary:
  Total:	23.8374 secs.
  Slowest:	0.1226 secs.
  Fastest:	0.0010 secs.
  Average:	0.0235 secs.
  Stddev:	0.0175 secs.
  Requests/sec:	8390.1908

Response time histogram:
  0.0010 [1]	|
  0.0132 [43566]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0253 [115481]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0375 [8083]	|∎∎
  0.0497 [2461]	|
  0.0618 [17602]	|∎∎∎∎∎∎
  0.0740 [10646]	|∎∎∎
  0.0861 [1887]	|
  0.0983 [73]	|
  0.1104 [0]	|
  0.1226 [200]	|

Latency distribution:
  10% in 0.0109 secs.
  25% in 0.0136 secs.
  50% in 0.0167 secs.
  75% in 0.0222 secs.
  90% in 0.0579 secs.
  95% in 0.0635 secs.
  99% in 0.0754 secs.
  99.9% in 0.1119 secs.

After

Summary:
  Total:	24.9543 secs.
  Slowest:	0.2118 secs.
  Fastest:	0.0054 secs.
  Average:	0.0246 secs.
  Stddev:	0.0174 secs.
  Requests/sec:	8014.6536

Response time histogram:
  0.0054 [1]	|
  0.0260 [155504]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0467 [11457]	|∎∎
  0.0673 [30294]	|∎∎∎∎∎∎∎
  0.0880 [2344]	|
  0.1086 [0]	|
  0.1292 [0]	|
  0.1499 [0]	|
  0.1705 [58]	|
  0.1912 [142]	|
  0.2118 [200]	|

Latency distribution:
  10% in 0.0129 secs.
  25% in 0.0146 secs.
  50% in 0.0176 secs.
  75% in 0.0241 secs.
  90% in 0.0561 secs.
  95% in 0.0605 secs.
  99% in 0.0691 secs.
  99.9% in 0.1951 secs.

For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.

When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
   1. etcd commits the boltDB transaction periodically instead of on each request;
   2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.

For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@serathius
Copy link
Member Author

cc @ptabor @ahrtr @tbg

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @serathius

One comment on the performance test... It would be better to provide the info of the test environment (i.e. CPU, memory and disk), because I believe different environment would have different result.

@ahrtr ahrtr merged commit 747bf5c into etcd-io:release-3.5 Sep 5, 2022
@serathius
Copy link
Member Author

Motivation for benchmark was to verify that performance regression is within expected range (~5% based on your tests). Even if I provided environment I don't expect that could be used to reproduce the results as I don't expect most people to have same CPU/Mem setup.

However we are right about different environments providing different results, for public announcements we should try to diversify our testing environments to have more accurate results. Instead of fixed number 4.5% we could provide a range (for example) 3-6% which could be reflect best and worst case scenarios we seen in different environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants