Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edf: adding a construction option with a given number of pre-picks #31592

Merged
merged 13 commits into from
Jan 25, 2024

Conversation

adisuissa
Copy link
Contributor

@adisuissa adisuissa commented Jan 2, 2024

Commit Message: edf: adding a construction option with a given number of pre-picks
Additional Description:
This is a follow up on #29953, with some insights about an additional scenario that has a similar problem.
The EDF scheduler is used in 2 levels of Zone-Aware LB policies:

  1. For choosing the locality to use (here for healthy hosts). Note that as opposed to the problem in (2), this also impacts equal-weighted localities.
  2. For choosing the host within the chosen locality (here).

Prior suggestions (WRSQ, #29953) focused on fixing level (2), and required converting the weights to integer-precision before performing the LB policy.
While this may work for (2), assuming one can convert the weights to integers by multiplying by some factor and truncating, for (1) this approach is more challenging as when computing the "effective locality weight" its value is dependent on the over-provisioning factor and the ratio of available hosts to all hosts. An additional benefit is that the current approach can also work with slow-start.

Thus, this PR suggests a different mechanism to "perform" some random number of picks when creating the EDF scheduler. The creation process is split into two steps:

  1. Estimate a lower-bound on the number of picks each entry will be chosen, and initialize the internal EDF priority-queue accordingly.
  2. Perform up to N "pickAndAdd" operations, where N is the number of localities/hosts.

Note that this approach is ~equal to performing some P picks from the scheduler where P >> N (~equal up to double precision computation differences and entries order of entries with the same weight).

After this PR, the next thing is to plumb the PRNG (or a random value) into the locality-scheduler creation process, and fix the call site for (1).

Here's a short doc providing the rationale behind this work.

Risk Level: low - code not used by Envoy's main codebase
Testing: Added tests to validate equivalence with prior approach.
Docs Changes: N/A
Release Notes:N/A - will be updated when the
Platform Specific Features: N/A

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Contributor Author

cc'ing all the interested stake-holders for the prior fix for comments/suggestions on this new approach
@tonya11en @wbpcode @nezdolik @htuch

@adisuissa
Copy link
Contributor Author

Here's a short doc providing the rationale behind this work.

@tonya11en
Copy link
Member

Thanks for the doc. This is one of those PRs I'll need to sit down and work through an example before grokking. I've got a block reserved for this tomorrow morning, so give me at least until then.

@tonya11en
Copy link
Member

Alright, spent some time with it and think it's really clever. It makes sense how you optimize the rotation by skipping to some virtual time in the future and burning through whatever remainder is left.

I hacked together a "first pick" test similar to the ones done for the WRSQ approach and noticed that the selection probabilities are not quite what I expected. For example, if I have 2 hosts with weights {25, 75} the selection probabilities are not quite right and it doesn't converge on the right answer as I increase the number of iterations in the test:

hosts: {1337, 1338}
weights: {1, 99}

[ RUN      ] EdfSchedulerTest.EdfHax
Testing first picks. iters=1000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 21.1%
	expected: 25%
	relative_error: -15.6%
1338:
	observed: 78.9%
	expected: 75%
	relative_error: 5.2%

Testing first picks. iters=10000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 20.36%
	expected: 25%
	relative_error: -18.56%
1338:
	observed: 79.64%
	expected: 75%
	relative_error: 6.18667%

Testing first picks. iters=100000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 20.351%
	expected: 25%
	relative_error: -18.596%
1338:
	observed: 79.649%
	expected: 75%
	relative_error: 6.19867%

Testing first picks. iters=1000000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 20.2892%
	expected: 25%
	relative_error: -18.8432%
1338:
	observed: 79.7108%
	expected: 75%
	relative_error: 6.28107%

The unrotated scheduler exhibits the behavior that motivated this change- chooses the same element every time. What I find confusing is that the relative error doesn't change for the scheduler that does the rotations.

I tried this with weights {1, 99} as well and see the same behavior:

[ RUN      ] EdfSchedulerTest.EdfHax
Testing first picks. iters=1000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 0.9%
	expected: 1%
	relative_error: -10%
1338:
	observed: 99.1%
	expected: 99%
	relative_error: 0.10101%

Testing first picks. iters=10000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 0.9%
	expected: 1%
	relative_error: -10%
1338:
	observed: 99.1%
	expected: 99%
	relative_error: 0.10101%

Testing first picks. iters=100000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 0.648%
	expected: 1%
	relative_error: -35.2%
1338:
	observed: 99.352%
	expected: 99%
	relative_error: 0.355556%

Testing first picks. iters=1000000
unrotated EDF:
1338: 100%

rotated EDF:
1337:
	observed: 0.6105%
	expected: 1%
	relative_error: -38.95%
1338:
	observed: 99.3895%
	expected: 99%
	relative_error: 0.393434%

Any ideas about what's going on here? I can look at this some more tomorrow if not. It's possible I'm doing something weird here, so if you want to take a look at the patch it's here: tonya11en@bd34ce7

@adisuissa
Copy link
Contributor Author

That's a good observation, and indeed strange.
Let me take a deeper look and add some tests that validate the correct behavior.
/wait-any

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…ler_init

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Contributor Author

Any ideas about what's going on here? I can look at this some more tomorrow if not. It's possible I'm doing something weird here, so if you want to take a look at the patch it's here: tonya11en@bd34ce7

@tonya11en thanks again for providing this test.
The underlying issue was that when there is a weight that is a multiplications of another (say 25 and 75) then the emulated picks can cause some imbalance on the first-pick.
I've solved it by augmenting the weights used, by adding an epsilon to each weight, and added a test based on the one you provided.

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@phlax
Copy link
Member

phlax commented Jan 9, 2024

/wait-any for further review

@adisuissa
Copy link
Contributor Author

@phlax IIUC adding "wait-any" prevents the reviewers from receiving PR updates on the slack channel.
I think it should only be added when waiting for the PR author to comment, no?

@phlax
Copy link
Member

phlax commented Jan 9, 2024

@adisuissa you are probably right - i tend to use it when its waiting for any comment rather than a code change

(fwiw i think we sorely need wait-review functionality)

Copy link
Member

@tonya11en tonya11en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GJ on fixing the first-pick skew! That was a weird one.

LGTM, modulo the minor comments I left.

source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Show resolved Hide resolved
EDF_TRACE("Emulated {} picks in init step, {} picks remaining for one after the other step",
picks_so_far, picks - picks_so_far);
for (; picks_so_far < picks; ++picks_so_far) {
scheduler.pickAndAdd(aug_calculate_weight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you be passing in aug_calculate_weight or calculate_weight here? You've already perturbed the weights, so at this point you should use the original weight calculation, right?

Comment on lines 235 to 237
// should be used. If the number of weights is large, the number of iterations
// should be larger than 10000.
constexpr uint64_t iterations = 100000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make sure this is enough to avoid flakiness by running the test multiple times (I think --runs_per_test=1000). Or just set this to 1e6 and don't worry about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a single test that has many random values makes more sense in this case, because at the end of test, it compares the expected and observed values.

test/common/upstream/edf_scheduler_test.cc Show resolved Hide resolved
Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is great! Left some nits.

source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
test/common/upstream/edf_scheduler_test.cc Outdated Show resolved Hide resolved
test/common/upstream/edf_scheduler_test.cc Outdated Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Show resolved Hide resolved
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Contributor Author

This is ready for another round of reviews, thanks!

Copy link
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few mote nits and ready to ship :)

test/common/upstream/edf_scheduler_test.cc Outdated Show resolved Hide resolved
test/common/upstream/edf_scheduler_test.cc Outdated Show resolved Hide resolved
test/common/upstream/edf_scheduler_test.cc Show resolved Hide resolved
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…ler_init

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Contributor Author

Assigning senior maintainers to review.
This is still not plumbed in, but should be reviewed for potential impact in the future.
/assign @htuch @wbpcode

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really clever. I want to make sure we can preserve the full reasoning for future readers as algorithms like this need a fast ramp for the uninitiated.

}));

// Nothing to do if there are no entries.
if (entries.size() == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refresh my memory on whether we already have an optimization here if all weights are equal somewhere earlier in the call chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typical answer is "it depends":

source/common/upstream/edf_scheduler.h Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Show resolved Hide resolved
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…ler_init

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Copy link
Contributor Author

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the requested test.

source/common/upstream/edf_scheduler.h Show resolved Hide resolved
}));

// Nothing to do if there are no entries.
if (entries.size() == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typical answer is "it depends":

source/common/upstream/edf_scheduler.h Show resolved Hide resolved
source/common/upstream/edf_scheduler.h Outdated Show resolved Hide resolved
test/common/upstream/edf_scheduler_test.cc Show resolved Hide resolved
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
nezdolik
nezdolik previously approved these changes Jan 18, 2024
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
// equal to the weights. Trying the case of 2 weights between 0 to 100, in steps
// of 0.001. This test takes too long, and therefore it is disabled by default.
// If the EDF scheduler is enable, it can be manually executed.
TEST_P(EdfSchedulerSpecialTest, DISABLED_ExhustiveValidator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Exhustive => Exhaustive

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants