-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
edf: adding a construction option with a given number of pre-picks #31592
edf: adding a construction option with a given number of pre-picks #31592
Conversation
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
cc'ing all the interested stake-holders for the prior fix for comments/suggestions on this new approach |
Here's a short doc providing the rationale behind this work. |
Thanks for the doc. This is one of those PRs I'll need to sit down and work through an example before grokking. I've got a block reserved for this tomorrow morning, so give me at least until then. |
Alright, spent some time with it and think it's really clever. It makes sense how you optimize the rotation by skipping to some virtual time in the future and burning through whatever remainder is left. I hacked together a "first pick" test similar to the ones done for the WRSQ approach and noticed that the selection probabilities are not quite what I expected. For example, if I have 2 hosts with weights {25, 75} the selection probabilities are not quite right and it doesn't converge on the right answer as I increase the number of iterations in the test: hosts: {1337, 1338}
The unrotated scheduler exhibits the behavior that motivated this change- chooses the same element every time. What I find confusing is that the relative error doesn't change for the scheduler that does the rotations. I tried this with weights {1, 99} as well and see the same behavior:
Any ideas about what's going on here? I can look at this some more tomorrow if not. It's possible I'm doing something weird here, so if you want to take a look at the patch it's here: tonya11en@bd34ce7 |
That's a good observation, and indeed strange. |
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…ler_init Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@tonya11en thanks again for providing this test. |
/wait-any for further review |
@phlax IIUC adding "wait-any" prevents the reviewers from receiving PR updates on the slack channel. |
@adisuissa you are probably right - i tend to use it when its waiting for any comment rather than a code change (fwiw i think we sorely need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GJ on fixing the first-pick skew! That was a weird one.
LGTM, modulo the minor comments I left.
EDF_TRACE("Emulated {} picks in init step, {} picks remaining for one after the other step", | ||
picks_so_far, picks - picks_so_far); | ||
for (; picks_so_far < picks; ++picks_so_far) { | ||
scheduler.pickAndAdd(aug_calculate_weight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you be passing in aug_calculate_weight
or calculate_weight
here? You've already perturbed the weights, so at this point you should use the original weight calculation, right?
// should be used. If the number of weights is large, the number of iterations | ||
// should be larger than 10000. | ||
constexpr uint64_t iterations = 100000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make sure this is enough to avoid flakiness by running the test multiple times (I think --runs_per_test=1000
). Or just set this to 1e6 and don't worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having a single test that has many random values makes more sense in this case, because at the end of test, it compares the expected and observed values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great! Left some nits.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
This is ready for another round of reviews, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few mote nits and ready to ship :)
…ler_init Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really clever. I want to make sure we can preserve the full reasoning for future readers as algorithms like this need a fast ramp for the uninitiated.
})); | ||
|
||
// Nothing to do if there are no entries. | ||
if (entries.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refresh my memory on whether we already have an optimization here if all weights are equal somewhere earlier in the call chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical answer is "it depends":
- In host selection - it does handle all weights are equal differently.
- In locality selection - it doesn't handle this case.
…ler_init Signed-off-by: Adi Suissa-Peleg <adip@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the requested test.
})); | ||
|
||
// Nothing to do if there are no entries. | ||
if (entries.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical answer is "it depends":
- In host selection - it does handle all weights are equal differently.
- In locality selection - it doesn't handle this case.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
// equal to the weights. Trying the case of 2 weights between 0 to 100, in steps | ||
// of 0.001. This test takes too long, and therefore it is disabled by default. | ||
// If the EDF scheduler is enable, it can be manually executed. | ||
TEST_P(EdfSchedulerSpecialTest, DISABLED_ExhustiveValidator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Exhustive => Exhaustive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Commit Message: edf: adding a construction option with a given number of pre-picks
Additional Description:
This is a follow up on #29953, with some insights about an additional scenario that has a similar problem.
The EDF scheduler is used in 2 levels of Zone-Aware LB policies:
Prior suggestions (WRSQ, #29953) focused on fixing level (2), and required converting the weights to integer-precision before performing the LB policy.
While this may work for (2), assuming one can convert the weights to integers by multiplying by some factor and truncating, for (1) this approach is more challenging as when computing the "effective locality weight" its value is dependent on the over-provisioning factor and the ratio of available hosts to all hosts. An additional benefit is that the current approach can also work with slow-start.
Thus, this PR suggests a different mechanism to "perform" some random number of picks when creating the EDF scheduler. The creation process is split into two steps:
Note that this approach is ~equal to performing some P picks from the scheduler where P >> N (~equal up to double precision computation differences and entries order of entries with the same weight).
After this PR, the next thing is to plumb the PRNG (or a random value) into the locality-scheduler creation process, and fix the call site for (1).
Here's a short doc providing the rationale behind this work.
Risk Level: low - code not used by Envoy's main codebase
Testing: Added tests to validate equivalence with prior approach.
Docs Changes: N/A
Release Notes:N/A - will be updated when the
Platform Specific Features: N/A