Do election in order based on failed primary rank to avoid voting conflicts #1018

enjoy-binbin · 2024-09-11T09:37:35Z

When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

…flicts When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary node name. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. Signed-off-by: Binbin <binloveplay1314@qq.com>

tests/unit/cluster/failover2.tcl

Signed-off-by: Binbin <binloveplay1314@qq.com>

codecov · 2024-09-14T03:42:06Z

Codecov Report

Attention: Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.85%. Comparing base (b3b4bdc) to head (6abc3c1).

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	96.15%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1018      +/-   ##
============================================
+ Coverage     70.83%   70.85%   +0.01%     
============================================
  Files           120      120              
  Lines         64911    64937      +26     
============================================
+ Hits          45982    46012      +30     
+ Misses        18929    18925       -4

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.77% <96.15%> (-0.02%)`	⬇️

... and 12 files with indirect coverage changes

PingXie

LGTM overall. I like this idea. Thanks @enjoy-binbin!

src/cluster_legacy.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

madolson · 2025-01-03T05:29:11Z

Seems like a good idea to me as well.

Signed-off-by: Binbin <binloveplay1314@qq.com>

tests/unit/cluster/failover2.tcl

madolson

Directionally good with this change, the remaining tests still look good.

Signed-off-by: Binbin <binloveplay1314@qq.com>

…flicts (valkey-io#1018) When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary shard-id. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. If a single primary is down, the behavior is the same as before. If multiple primaries are down, their replica election initiation time will be delayed by 500ms according to the ranking. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: proost <jwalag87@gmail.com>

…flicts (valkey-io#1018) When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary shard-id. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. If a single primary is down, the behavior is the same as before. If multiple primaries are down, their replica election initiation time will be delayed by 500ms according to the ranking. Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin requested a review from PingXie September 11, 2024 09:37

enjoy-binbin commented Sep 11, 2024

View reviewed changes

tests/unit/cluster/failover2.tcl Show resolved Hide resolved

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Sep 11, 2024

fix format

2a5dd80

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin mentioned this pull request Sep 13, 2024

Fix replica not able to initate election in time when epoch fails #1009

Merged

enjoy-binbin added 2 commits September 14, 2024 11:26

Merge remote-tracking branch 'upstream/unstable' into primary_fail_rank

65d05dd

Signed-off-by: Binbin <binloveplay1314@qq.com>

merge 1018

c6a71b5

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie reviewed Sep 23, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

enjoy-binbin added 2 commits December 27, 2024 12:27

Merge remote-tracking branch 'upstream/unstable' into primary_fail_rank

c45e96a

Signed-off-by: Binbin <binloveplay1314@qq.com>

Change to use shard-id

e084dc4

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin requested review from zuiderkwast, hpatro and madolson December 27, 2024 04:31

enjoy-binbin added 2 commits January 3, 2025 14:30

Merge remote-tracking branch 'upstream/unstable' into primary_fail_rank

f001425

Signed-off-by: Binbin <binloveplay1314@qq.com>

wrap test with solo, use shard_id

6abc3c1

Signed-off-by: Binbin <binloveplay1314@qq.com>

madolson reviewed Jan 10, 2025

View reviewed changes

tests/unit/cluster/failover2.tcl Outdated Show resolved Hide resolved

madolson approved these changes Jan 10, 2025

View reviewed changes

indent

e82a7a0

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jan 11, 2025

enjoy-binbin merged commit 211b250 into valkey-io:unstable Jan 11, 2025
1 check passed

enjoy-binbin deleted the primary_fail_rank branch January 11, 2025 02:43

enjoy-binbin mentioned this pull request Feb 1, 2025

[test-failure] flaky test failure in failover2.tcl #1640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do election in order based on failed primary rank to avoid voting conflicts #1018

Do election in order based on failed primary rank to avoid voting conflicts #1018

enjoy-binbin commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 14, 2024 •

edited

Loading

PingXie left a comment

madolson commented Jan 3, 2025

madolson left a comment

Do election in order based on failed primary rank to avoid voting conflicts #1018

Do election in order based on failed primary rank to avoid voting conflicts #1018

Conversation

enjoy-binbin commented Sep 11, 2024 • edited Loading

codecov bot commented Sep 14, 2024 • edited Loading

Codecov Report

PingXie left a comment

Choose a reason for hiding this comment

madolson commented Jan 3, 2025

madolson left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 14, 2024 •

edited

Loading