Implement Weighted Random Selection Queue (WRSQ) load balancing #14597

mattklein123 · 2021-01-07T20:13:28Z

See #14360 (comment) and tonya11en#1 (comment). This will require #14569 to be merged and #14360 to be resolved.

We should use WRSQ for:

Weighted priority picking
Weighted locality picking
WRR host picking

We will still use EDF for WLR host picking or any other future picking in which weights can and do rapidly change.

cc @tonya11en @antoniovicente @htuch @snowp

mattklein123 · 2021-01-07T20:13:59Z

@tonya11en is this something that you would like to implement or should I look at implementing it?

tonya11en · 2021-01-07T20:19:18Z

@mattklein123 I'd like to implement it, but will probably be sending a patch out over the weekend. I don't feel strongly about it, so if you can knock it out sooner, go for it and I'll be happy to review.

mattklein123 · 2021-01-07T21:25:52Z

@tonya11en go for it. I will assign over to you also and I will assume you are going to work on it for now.

htuch · 2021-01-08T00:56:10Z

Can we guard existing behavior with runtime for at least a period of time? I agree with direction here but would like to make sure we move carefully when changing LB behaviors. In separate discussions with @antoniovicente, I think we want to make substantial changes to affinity balancing (i.e. that break hash consistency) to become new LB algorithms so that we can safely rollout and migrate.

mattklein123 · 2021-01-08T01:04:58Z

Yes, the plan would be to feature flag for sure.

I think we want to make substantial changes to affinity balancing (i.e. that break hash consistency) to become new LB algorithms so that we can safely rollout and migrate.

Sorry can you clarify?

htuch · 2021-01-08T01:09:56Z

I think we want to make substantial changes to affinity balancing (i.e. that break hash consistency) to become new LB algorithms so that we can safely rollout and migrate.

Sorry can you clarify?

Let's say we modify ring hash LB and change the hash algorithm, so that a new binary rollout would not be consistent with the hashing of a previous Envoy with ring hash LB. I think the safe thing here to do would be to retain the existing algorithm and allow the fleet to flip to the new hash function once rollout completes. This could be done with a runtime flag, but it might be more convenient to treat it as a new LB algorithm as well.

mattklein123 · 2021-01-08T17:29:10Z

This could be done with a runtime flag, but it might be more convenient to treat it as a new LB algorithm as well.

Yeah this makes sense. We should talk more about this offline to figure out the best way of handling this. We have done some hash algorithm changes at Lyft and it's pretty painful to roll out. Would be nice to do something better if possible.

antoniovicente · 2021-01-08T22:15:06Z

I assume efficiency improvements to hash balancers to involve the introduction of new types. Changing the algorithm used via config would be a big challenge since it would likely break a lot of things during the time the config in production hasn't fully converged to the new state.

htuch · 2021-01-11T02:41:18Z

@antoniovicente how would you suggest changing the algorithm (or version of the algorithm)? Is there some way to do this in production that wouldn't involve some period of inconsistency?

yishuT · 2021-02-07T01:11:33Z

is there wiki page or paper about WRSQ?

tonya11en · 2021-02-07T01:14:13Z

is there wiki page or paper about WRSQ?

@yishuT I dreamed this one up, so unfortunately I don't have any reference material beyond what I wrote in the PR for the scheduler #14681. I'm happy to answer any questions you may have, to the best of my ability.

yishuT · 2021-02-07T01:22:14Z

is there wiki page or paper about WRSQ?

@yishuT I dreamed this one up, so unfortunately I don't have any reference material beyond what I wrote in the PR for the scheduler #14681. I'm happy to answer any questions you may have, to the best of my ability.

ah wow, nice! Thanks for the pointer. I will read through the PR

antoniovicente · 2021-04-27T21:53:35Z

@antoniovicente how would you suggest changing the algorithm (or version of the algorithm)? Is there some way to do this in production that wouldn't involve some period of inconsistency?

I don't think there's a way to accomplish this without having an user visible rehash event.

jmarantz · 2021-08-16T11:42:35Z

/sub

jmarantz · 2021-08-16T11:43:07Z

WIth #14681 merged, what's the next step to resolving this bug?

tonya11en · 2021-08-16T18:28:17Z

WIth #14681 merged, what's the next step to resolving this bug?

I think to address the issue that motivated this thing, I'll swap out the locality scheduler with the WRSQ scheduler. I'll guard it with a feature flag that can revert back to EDF in case it ends up causing unforeseen problems.

Assuming we have good results, I'll look into adding a config option in the common LB config.

tonya11en · 2021-08-26T18:34:48Z

FYI, I'm putting finishing touches on the final PR and will send it out in the next couple of days.

update (9/2/2021):
Many tests relied on EDF's deterministic behavior, so I'm making changes to the tests. It took a bit longer than expected, but I'm making incremental progress on it. Hopefully won't be much longer.

mattklein123 added help wanted Needs help! area/load balancing labels Jan 7, 2021

mattklein123 self-assigned this Jan 7, 2021

mattklein123 assigned tonya11en Jan 7, 2021

tonya11en mentioned this issue Jan 13, 2021

upstream: Implement WRSQ Scheduler #14681

Merged

tonya11en mentioned this issue Jan 23, 2021

A34 weighted_round_robin lb_policy for per endpoint weight from ClusterLoadAssignment response grpc/proposal#202

Open

tonya11en mentioned this issue Sep 8, 2021

Enable WRSQ for locality and WRR LB #18020

Closed

mattklein123 removed their assignment Mar 22, 2022

tonya11en mentioned this issue Nov 15, 2023

lb-edf: fix lb initialization to choose from the correct set of weighted hosts #29953

Closed

tonya11en mentioned this issue Dec 28, 2023

Endpoints weight are not respected from the start #31378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Weighted Random Selection Queue (WRSQ) load balancing #14597

Implement Weighted Random Selection Queue (WRSQ) load balancing #14597

mattklein123 commented Jan 7, 2021 •

edited

Loading

mattklein123 commented Jan 7, 2021

tonya11en commented Jan 7, 2021

mattklein123 commented Jan 7, 2021

htuch commented Jan 8, 2021

mattklein123 commented Jan 8, 2021

htuch commented Jan 8, 2021

mattklein123 commented Jan 8, 2021

antoniovicente commented Jan 8, 2021

htuch commented Jan 11, 2021

yishuT commented Feb 7, 2021

tonya11en commented Feb 7, 2021

yishuT commented Feb 7, 2021

antoniovicente commented Apr 27, 2021

jmarantz commented Aug 16, 2021

jmarantz commented Aug 16, 2021

tonya11en commented Aug 16, 2021

tonya11en commented Aug 26, 2021 •

edited

Loading

Implement Weighted Random Selection Queue (WRSQ) load balancing #14597

Implement Weighted Random Selection Queue (WRSQ) load balancing #14597

Comments

mattklein123 commented Jan 7, 2021 • edited Loading

mattklein123 commented Jan 7, 2021

tonya11en commented Jan 7, 2021

mattklein123 commented Jan 7, 2021

htuch commented Jan 8, 2021

mattklein123 commented Jan 8, 2021

htuch commented Jan 8, 2021

mattklein123 commented Jan 8, 2021

antoniovicente commented Jan 8, 2021

htuch commented Jan 11, 2021

yishuT commented Feb 7, 2021

tonya11en commented Feb 7, 2021

yishuT commented Feb 7, 2021

antoniovicente commented Apr 27, 2021

jmarantz commented Aug 16, 2021

jmarantz commented Aug 16, 2021

tonya11en commented Aug 16, 2021

tonya11en commented Aug 26, 2021 • edited Loading

mattklein123 commented Jan 7, 2021 •

edited

Loading

tonya11en commented Aug 26, 2021 •

edited

Loading