-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lb-edf: fix lb initialization to choose from the correct set of weighted hosts #29953
Conversation
…ted hosts Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
const uint32_t host0_weight = static_cast<uint32_t>(std::ceil(hostWeight(*hosts[0]))); | ||
const uint32_t host1_weight = static_cast<uint32_t>(std::ceil(hostWeight(*hosts[1]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is what will happen if the hostWeight
return zero? Or will this happen?
Others LGTM, :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host weight should be at least 1.
There's some normalization that is done on the host weight, but I don't think it can become 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, some lb will add a factor to the original weight. (Like lbs that enabled slow start)
but finally it should large than 0. Let's do a check. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a lower bound to the normalizer.
@nezdolik could slow start set host weights to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slow start code uses max(calculated_weight, min_weight_percentage) for weight, min_weight_percentage defaults to 0.1. Am not by computer now, but could paste relevant links here later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code applies slow start factor in most complex logical if branch (in other logical branches original host weight is returned):
return host_weight * std::max(applyAggressionFactor(time_factor, aggression),
slow_start_min_weight_percent_);
- If original
host_weight
is 0 computation result will be 0 applyAggressionFactor(time_factor, aggression)
cannot be zero, time_factor is calculated as:
auto time_factor = static_cast<double>(std::max(std::chrono::milliseconds(1).count(),
in_healthy_state_duration.count())) /
slow_start_window_.count();
to get the closest value to 0, slow_start_window_ divisor should be max value of std::chrono::milliseconds and dividend 1 millisecond, that value is still not 0 and fits into 4 bytes (double type size is 8 bytes, so should be no overflow).
3. applyAggressionFactor is calculated as:
if (aggression == 1.0 || time_factor == 1.0) {
return time_factor;
} else {
return std::pow(time_factor, 1.0 / aggression);
}
cannot evaluate to 0 as time_factor cannot be 0 (proven in step2). Min possible value for time_factor in power of minimal possible value for 1.0/aggression (which is 1/double_max_value) still did not yield overflow/zero.
4. slow_start_min_weight_percent_ can be set to 0 according to api.proto, but we have a max guard and in that case applyAggressionFactor(time_factor, aggression)
will be selected as max, and is proven it cannot be zero in step 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I relied on various online tools for floating point precision arithmetics and overflow testing like https://www.eeweb.com/tools/online-scientific-calculator/ or https://www.h-schmidt.net/FloatConverter/IEEE754.html. With 99% confidence slow start factor function cannot return 0 :) (unless original host weight is zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nezdolik for the thorough verification.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
// Compute the greatest common divisor of all the weights, to normalize | ||
// the sum of weights to a smaller number. This can be avoided, but will | ||
// result in higher numbers and more calls to `pickAndAdd()`. | ||
const uint32_t host0_weight = static_cast<uint32_t>(std::ceil(hostWeight(*hosts[0]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't using ceil
skew the proportion of host weights in some cases? E.g. we have host1 with weight 0.1 and host2 with weight 1 (proportion 1:10), by using ceil that proportion would turn into 1:1 and make host1 as equally likely to be picked as host2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, considering the additional factor like the slow start, the hostWeight
may return a very minor value like 0.1.
And should we limit the max number of this initial randomize (or add a switch to it)? If weight difference of different endpoints is very very large, like 1 and 9999, then the initial randomize number will very large. cc @adisuissa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took that into consideration when looking for how to fix the bug.
I'm not aware of a way to compute a gcd of non-integers. The alternative is to keep the bug as is (which also impacts a case of 2 hosts one with weight=1, and the other with weight=0.1). If you know of an efficient way to compute the length of the cycle with non-integer weights, please let me know.
The right way to fix this is probably convert this to static stride-scheduling (grpc implementations). However that will be a substantial change, and if added as an alternative, it will still not solve the current bug.
RE limiting the max number of randomized, I can probably add that somewhere (bootstrap?), but that seems more like a feature that will never go away to solve a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also tagging @tonya11en in case he has great ideas (he usually does :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @adisuissa is there any input about the problem that @nezdolik raised. Basically, I think this is only point that may block this PR. (Ignore my suggestion about the max number of randomize, we can try to resolve it until it actually become a problem)
// random value, 6 times the first host will be chosen, 3 times the second | ||
// host will be chosen, and 1 time the third host will be chosen. | ||
hostSet().healthy_hosts_ = { | ||
makeTestHost(info_, "tcp://127.0.0.1:80", simTime(), 6), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also test with non integer weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see that original weights are integer only, but perhaps testing with slow start config is worth to cover as well.
Looks like this needs a main merge but @wbpcode is this waiting on you otherwise? |
/wait-any |
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
// Compute the greatest common divisor of all the weights, to normalize | ||
// the sum of weights to a smaller number. This can be avoided, but will | ||
// result in higher numbers and more calls to `pickAndAdd()`. | ||
const uint32_t host0_weight = static_cast<uint32_t>(std::ceil(hostWeight(*hosts[0]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also tagging @tonya11en in case he has great ideas (he usually does :) )
for (uint32_t i = 0; i < seed_ % hosts.size(); ++i) { | ||
auto host = | ||
scheduler.edf_->pickAndAdd([this](const Host& host) { return hostWeight(host); }); | ||
if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.edf_lb_scheduler_init_fix") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more aspect to consider, is that LeastRequest loadbalancer can also scale host weight based on number of active requests: https://github.com/envoyproxy/envoy/blob/main/source/common/upstream/load_balancer_impl.cc#L1276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Let me address that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry I missed this last month (was on vacation). This problem looks eerily similar to the "first pick problem" that was discussed in #14597 and summarized in #14681. Hopefully I'm understanding this problem correctly. In what I linked, the gist of the problem was that in cases where you have 2 hosts (weight 99 and weight 1) that are picked after initializing the EDF scheduler, it would always pick the host with weight 99. I did a lot of investigation into that and landed on an alternative scheduling discipline that was merged in #14681 called WRSQ.
There were some snags that it appears you ran into as well with scenarios where the effective host weights are changing after each selection (like in slow-start and least request). We did conclude that there's no good way to maintain correctness in the selection probabilities without a linear sweep of the host set to perform a weight random selection.
I never got around to plugging in the WRSQ scheduler after implementing it and merging, so if it solves your problem you should consider using it here for the WRR cases. LMK if it makes sense or if I should write up something a bit more thorough.
Thanks @tonya11en for this, I wasn't aware of WRSQ. I'm not sure I understand how it solves the problem: IIUC the call to Internally we'd like to end up with an algorithm that may be a bit lighter (static-stride-scheduler, see grpc implementation). However, as you pointed out, this will still not solve the underlying issue of non-integer weights. I wonder if the right approach will be to separate the cases where weights are non-integers, as these also have some CPU performance. |
My understanding of the problem here is that the selection probability of the "first pick" after the EDF scheduler is initialized is not correct, given the weights of the hosts in the set. This was measured in an earlier simulation with 2 hosts, weight 99 and weight 1. The weight 99 host was picked first every single time:
Using WRSQ made it so that the first pick honored the selection probabilities warranted by the host weights:
What does "lighter" mean in this case? The tradeoff with using the WRSQ scheduler is that it has to do a linear sweep of all the hosts every time one of their weights are updated, which is what makes it unsuitable for LRLB. Outside of that case, it's performance characteristics are at least as fast as EDF.
This would be in all the cases where the weights are dynamically updated, like in least request LB or slow-start, right? There's no other way to handle this afaik. |
I should clarify that I'm not trying to push WRSQ on you here. Just thought it might be useful context for you. |
Nice, sorry if I missed these tests in the replies to the original issues.
"Lighter" - uses a single integer to decide which host to pick next. The integer is incremented every time a host is chosen.
Yeah, not sure what's the best approach here. |
friendly ping @tonya11en , I think this is waiting on your response. |
This was just an existing test that I hacked together a long time ago. The patch is in tonya11en#1 if you wanted to take a look at what I was doing there. Regarding the weighted random selections with non-integral weights, I'm not sure what folks here want to do about that problem (block PR, etc.). There's not a sub-linear selection algorithm (that I'm aware of) which is resilient to constantly changing and fractional weights like we see in the least request load balancer AND doesn't fall victim to this "first-pick problem". I wasn't aware that I'm blocking progress here, so let me know what you need on my end. |
/wait |
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Assigning Harvey as a senior-maintainer and may have a better way to solve this issue.
|
It feels like WRSQ should be where we throw effort going forward, and if there is a lightweight way to "fix" things here for integer only weights, that seems fine as well. |
Just FYI, the WRSQ scheduler implementation was already merged a while ago. It was just never plumbed into the relevant LBs (never finished review process for #18020). |
/wait |
…31592) This is a follow up on #29953, with some insights about an additional scenario that has a similar problem. The EDF scheduler is used in 2 levels of Zone-Aware LB policies: 1. For choosing the locality to use (here for healthy hosts). Note that as opposed to the problem in (2), this also impacts equal-weighted localities. 2. For choosing the host within the chosen locality (here). Prior suggestions (WRSQ, #29953) focused on fixing level (2), and required converting the weights to integer-precision before performing the LB policy. While this may work for (2), assuming one can convert the weights to integers by multiplying by some factor and truncating, for (1) this approach is more challenging as when computing the "effective locality weight" its value is dependent on the over-provisioning factor and the ratio of available hosts to all hosts. An additional benefit is that the current approach can also work with slow-start. Thus, this PR suggests a different mechanism to "perform" some random number of picks when creating the EDF scheduler. The creation process is split into two steps: 1. Estimate a lower-bound on the number of picks each entry will be chosen, and initialize the internal EDF priority-queue accordingly. 2. Perform up to N "pickAndAdd" operations, where N is the number of localities/hosts. Note that this approach is ~equal to performing some P picks from the scheduler where P >> N (~equal up to double precision computation differences and entries order of entries with the same weight). After this PR, the next thing is to plumb the PRNG (or a random value) into the locality-scheduler creation process, and fix the call site for (1). Here's a short doc providing the rationale behind this work: https://docs.google.com/document/d/1i53RYOy8sJUwL6_PSIF4yN39Xc1LvJtZ-18aSBZq0vM/edit?usp=sharing Risk Level: low - code not used by Envoy's main codebase Testing: Added tests to validate equivalence with prior approach. Docs Changes: N/A Release Notes:N/A - will be updated when the Platform Specific Features: N/A Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Closing this in favor of #31592 |
Commit Message: lb-edf: fix lb initialization to choose from the correct set of weighted hosts
Additional Description:
Fixing the initialization of EDF based LB algorithms. Prior to this PR, the initialization would randomize a number between [0, #hosts). However, it should've randomize by the number of entries in the "pick list".
For example, if there are 2 hosts, host_A with weight 99 and host_B with weight 1, the randomization will land in either index 0 or 1, whereas it should have been between 0 and 99 (as there are 100 entries in the "pick list").
This may cause a skew in the traffic distribution, especially in a fleet of Envoy, and when Envoy receive periodic assignment updates.
Risk Level: Medium - impacts the initialization of LB algorithms
Testing: Added unit tests
Docs Changes: N/A
Release Notes: Added a release note as well as the
envoy.reloadable_features.edf_lb_scheduler_init_fix
runtime guard.Platform Specific Features: N/A