-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Leader load balancing can cause CHECK failures if stepdown task is pending on next run #5181
Labels
Comments
hulien22
added a commit
that referenced
this issue
Jul 27, 2020
…ss to be made across tables (#5021) (#5181) Summary: Adding new flag `load_balancer_max_concurrent_moves_per_table` to limit the number of leader moves per table. This flag is meant to be used with `load_balancer_max_concurrent_moves` in order to improve performance for these moves. Also fixing issue #5181, where having pending leader moves on subsequent LB runs can lead to the same tablet being told to move twice, thus leading to a check failure. This was caused from `AnalyzeTabletsUnlocked` not properly updating the state for leader stepdowns, and has been fixed by storing new_leader_uuid instead of change_config_ts_uuid for pending leader stepdown tasks. Test Plan: `ybd --cxx-test load_balancer_multi_table-test` -- test for load_balancer_max_concurrent_moves_per_table `ybd --cxx-test load_balancer-test --gtest_filter LoadBalancerTest.PendingLeaderStepdownRegressTest` -- regression test for issue #5181 Reviewers: hector, bogdan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8903
Closed in 6382fde. |
hulien22
added a commit
that referenced
this issue
Aug 4, 2020
…e allowing progress to be made across tables (#5021) (#5181) Summary: Adding new flag `load_balancer_max_concurrent_moves_per_table` to limit the number of leader moves per table. This flag is meant to be used with `load_balancer_max_concurrent_moves` in order to improve performance for these moves. Also fixing issue #5181, where having pending leader moves on subsequent LB runs can lead to the same tablet being told to move twice, thus leading to a check failure. This was caused from `AnalyzeTabletsUnlocked` not properly updating the state for leader stepdowns, and has been fixed by storing new_leader_uuid instead of change_config_ts_uuid for pending leader stepdown tasks. Test Plan: Jenkins: rebase: 2.1 Reviewers: bogdan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D9082
hulien22
added a commit
that referenced
this issue
Aug 4, 2020
…e allowing progress to be made across tables (#5021) (#5181) Summary: Adding new flag `load_balancer_max_concurrent_moves_per_table` to limit the number of leader moves per table. This flag is meant to be used with `load_balancer_max_concurrent_moves` in order to improve performance for these moves. Also fixing issue #5181, where having pending leader moves on subsequent LB runs can lead to the same tablet being told to move twice, thus leading to a check failure. This was caused from `AnalyzeTabletsUnlocked` not properly updating the state for leader stepdowns, and has been fixed by storing new_leader_uuid instead of change_config_ts_uuid for pending leader stepdown tasks. Test Plan: Jenkins: rebase: 2.2 Reviewers: bogdan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D9083
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Seems that if the LB issues a leader stepdown task in one run, but it's still pending in the next run, we could issue the same task again, leading to a CHECK failure.
Relevant stack
The text was updated successfully, but these errors were encountered: