Add support for N levels of failover endpoints #1929

surki · 2017-10-24T12:32:44Z

I would like to use Envoy to load balance long lived TCP connections under the following setup:
Say, I have 2 machines m1 and m2. I would like all traffic to go to m1. The traffic should go to m2 only if m1 fails (either active/passive healthcheck) Or if it exceeds certain number of connections defined per host.

This is similar to the "first" load balancer in HAProxy: https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#balance

surki · 2017-10-24T13:13:38Z

So I did quick PoC to see what it would take to add this feature, here are the changes:

Envoy changes: master...surki:load_balancer_standby
Envoy-api changes: envoyproxy/data-plane-api@master...surki:load_balancer_standby

and corresponding config file:
(it is a proxy for mysql for read only traffic, it sends read only queries to "slave", if "slave" fails/reaches capacity, it will redirect the connections to "master". Also it does an active health check by sending "auth packet" (with username "proxyhealth") and an immediate "quit command packet" using mysql protocol 4.1)

{
  "listeners": [{
    "address": "tcp://127.0.0.1:33060",
    "filters": [{
      "type": "read",
      "name": "tcp_proxy",
      "config": {
        "stat_prefix": "mysql",
        "route_config": {
          "routes": [{
            "cluster": "mysql"
          }]
        }
      }
    }]
  }],
  "admin": {
    "access_log_path": "/tmp/envoy.log",
    "address": "tcp://127.0.0.1:9901"
  },
  "cluster_manager": {
    "clusters": [{
      "name": "mysql",
      "connect_timeout_ms": 2500,
      "type": "strict_dns",
      "lb_type": "standby",
      "health_check": {
        "type": "tcp",
        "timeout_ms": 2000,
        "interval_ms": 10000,
        "unhealthy_threshold": 2,
        "healthy_threshold": 2,
        "service_name": "mysql",
        "send": [
          {"binary": "2d0000"},
          {"binary": "01"},
          {"binary": "00820000"},
          {"binary": "00000001"},
          {"binary": "21"},
          {"binary": "0000000000000000000000000000000000000000000000"},
          {"binary": "70726f78796865616c74680000"},
          {"binary": "010000"},
          {"binary": "00"},
          {"binary": "01"}
        ],
        "receive": [
          {"binary": "07000002000000"}
        ]
      },
      "hosts": [{
        "url": "tcp://mysql_slave:3306",
        "id": 0,
        "max_connections": 2048
      }, {
        "url": "tcp://mysql_master:3306",
        "id": 1,
        "max_connections": 1024
      }]
    }]
  }
}

Not sure if above changes are inline with general Envoy architecture/good practice, few questions:

Looks like currently supported load balancing algorithms implicitly assume http requests. The above change is specific to TCP Proxy, though I don't see any reason why it won't be applicable to http requests. I couldn't find anywhere to tell the load balancer algorithm to act on "request" or "connections", short of putting the type in algorithm name itself.
I am not sure what is the best way to propagate "id" and "max_connections" all the way to individual host entries available in "LoadBalancer::chooseHost()".
API v2 (i.e., protobuf3) vs JSON v1: I am not sure if adding things into JSON is allowed now. Should I just stick to the protobuf3 changes alone and stop using current .json config format?
There may be multiple resolved host IPs in LoadBalancer::chooseHost() for each host specified in the config file (multiple A/AAAA records + DNS type we use for cluster etc), so probably the better way is to apply max_connections to each resolved IP independently.

PS: We are also planning to add MySQL filter for exposing interesting stats (just like other filters like DynamoDB etc does). I see that there is an experimental Lua support, probably I may try out that for MySQL protocol parsing/stats extraction.

mattklein123 · 2017-10-24T19:21:32Z

Some quick questions:

Why is ID needed?
Do we really need max connections on a per host basis? Or can it be the same for all hosts in a cluster?

surki · 2017-10-25T07:57:44Z

ID: Need a way to deterministically pick up the host from first to last. Since the DNS resolving happens async for all hosts concurrently, the values populated in LoadBalancerBase::host_set_ will have random order. Any other way to achieve the order? (I need to choose hosts in the exact order specified in the json config file)

max_connections per host basis: I thought it will be more flexible since different hosts can have different througput/capacity etc. We can move max_connections out to cluster level if you think that is something we will not need host level (personally I do not need per host level, though I could see usecase for others).

mattklein123 · 2017-10-25T23:23:13Z

I think there is a general question here as to whether this should be a new load balancing policy, or actually built directly into the tcp_proxy code. For example, one could imagine adding the concept of a fallback cluster to tcp_proxy routing, and then having rules on when the fallback is actually used. I think @rshriram and @ggreenway had some thoughts on this. In some ways it fits a bit better in this way vs. a discrete balancing policy.

If we go with the LB built-in approach, I understand the need for ID. It seems unfortunate to have to add this to the Address structure, but I'm not sure how else to deal with this without doing a breaking change which we can't do at this point. @htuch any thoughts here?

Re: max connections per host, unless there is a specific need I would probably add this as a property of the cluster where we already have max_requests_per_connection.

I would like to get some additional thoughts on whether we should do that is a new LB type vs. building into tcp_proxy. I think I am leanings towards doing what you have done here but would like some other opinions.

rshriram · 2017-10-26T02:50:34Z

I wish we could somehow use the max connections settings from the circuit breaker. This use case seems a bit niche, to create a special load balancer (if there are other scenarios where this bin packing style LB is needed, I would love to hear them).

Even in this use case, what would you do if you have multiple slaves? It seems that the only requirement here is to use the master after slaves are filled up. And among the slaves, it doesn’t matter how connections are assigned. Is this assumption correct? If so, we could imagine creating some form of fallback cluster option that contains MySQL master and the slaves are in the main cluster. The fallback option seems like it would get more mileage across multiple use cases. You can set the circuit breaker in the main cluster to be the total number of connections acceptable in the slaves. And failing that, the fallback cluster will be picked.

For the MySQL filter, i suggest doing it in c++, for performance reasons and the fact that lua filters for tcp are not here yet. You could take a look at the mongo filter for reference.

Another idea to satisfy the particular MySQL use case you have (assuming typical MySQL installations have same needs): create the MySQL filter very much like redis filter. You can have first class support for master and slave cluster, in the filter. You will have control over when to fallback to the master.

surki · 2017-10-26T09:18:30Z

On 10/26/2017 04:53 AM, Matt Klein wrote: I think there is a general question here as to whether this should be a new load balancing policy, or actually built directly into the tcp_proxy code. For example, one could imagine adding the concept of a fallback cluster to tcp_proxy routing, and then having rules on when the fallback is actually used. I think @rshriram <https://github.com/rshriram> and @ggreenway <https://github.com/ggreenway> had some thoughts on this. In some ways it fits a bit better in this way vs. a discrete balancing policy. If we go with the LB built-in approach, I understand the need for ID. It seems unfortunate to have to add this to the |Address| structure, but I'm not sure how else to deal with this without doing a breaking change which we can't do at this point. @htuch <https://github.com/htuch> any thoughts here?

Yes, ID is the only remaining part (we can move out or drop max_connections, please see below). Or only if I could find another way to deterministically get the host sorted.

Re: max connections per host, unless there is a specific need I would probably add this as a property of the cluster where we already have |max_requests_per_connection|.

That should be fine. Or probably we can drop off max_connections altogether, it can be just a true standby load balancing (i.e., load balance only on active/passive health check fail. I was just trying to get equivalent HAProxy config for no good reason.

…

I would like to get some additional thoughts on whether we should do that is a new LB type vs. building into tcp_proxy. I think I am leanings towards doing what you have done here but would like some other opinions. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1929 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC9hPQYyRBmieMQaN77nf-jAQRmNM_gks5sv8LrgaJpZM4QEV8j>.

surki · 2017-10-26T10:04:47Z

On 10/26/2017 08:20 AM, Shriram Rajagopalan wrote: I wish we could somehow use the max connections settings from the circuit breaker. This use case seems a bit niche, to create a special load balancer (if there are other scenarios where this bin packing style LB is needed, I would love to hear them).

Thinking about bit more on this, probably we don't need max_connections (please see the above reply). All I wanted is: "if m1 fails, switch over to m2. If m2 fails switch over to m3" etc.

Even in this use case, what would you do if you have multiple slaves? It seems that the only requirement here is to use the master after slaves are filled up. And among the slaves, it doesn’t matter how connections are assigned. Is this assumption correct? If so, we could imagine creating some form of fallback cluster option that contains MySQL master and the slaves are in the main cluster. The fallback option seems like it would get more mileage across multiple use cases. You can set the circuit breaker in the main cluster to be the total number of connections acceptable in the slaves. And failing that, the fallback cluster will be picked.

There can be more than 2 machines. I am okay either way: LB or some kind of cluster fallback/tcp proxy routing. LB seemed simple enough to implement (and it could mimic current HAProxy setup), so I could get to other interesting bit (MySQL filter)

For the MySQL filter, i suggest doing it in c++, for performance reasons and the fact that lua filters for tcp are not here yet. You could take a look at the mongo filter for reference.

Agree, C++ could be the right choice. I was not sure how easy it is to maintain an out of tree Envoy filter, so thought Lua could simplify things. Taking a look at other filters, the filter API is very nice/simple and shouldn't be too difficult to maintain.

Another idea to satisfy the particular MySQL use case you have (assuming typical MySQL installations have same needs): create the MySQL filter very much like redis filter. You can have first class support for master and slave cluster, in the filter. You will have control over when to fallback to the master.

Interesting. Unfortunately, for us, the app makes this decision currently (I didn't show full config above, so we have multiple clusters, with different combinations, like ClusterA: "m1, m2, m3", ClusterB: "m2, m1, m3" etc). But that is something could be moved to a filter at some point down the road.

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1929 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC9hFeXmSgOe0ZWVC2BTGQ20g7aRW5Oks5sv_OGgaJpZM4QEV8j>.

mattklein123 · 2017-10-26T17:39:10Z

If you don't need max connections lets just drop that. Also, I just looked at the code and the hosts actually should remain ordered for static/strict_dns clusters. For strict_dns this is truly only if DNS for each "host" returns a single IP, which I'm guessing is your use case: https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L483

Assuming the above is OK for you, I think this is pretty simple. Basically just write a new LB which assumes hosts are in order, and just always sends to first, if failed, sends to next, etc. Would that work?

Agree, C++ could be the right choice. I was not sure how easy it is to
maintain an out of tree Envoy filter

If you are going to do a MySQL filter and are willing to go through code review process, etc. please just contribute it! I think there are many people that would love a MySQL filter.

ggreenway · 2017-10-26T20:29:01Z

The more general solution is priority groups (at least that's what f5 called it).

Basically, you have multiple groups/pools of machines, and the groups are priority ordered. So you try to pick a host from the first group; if that fails, try from the next group, etc.

To represent that in envoy, I think it makes more sense to put it in the cluster/lb side than in the tcp_proxy. This could also be applicable to http easily enough.

I see two ways to doing it that could make sense:

New LB type on clusters for this mode, and have some way for each host in the cluster of annotating which priority group it is in.
Create a cluster for each priority group, which could use any LB method and is one of the existing types. Then create another cluster of a new type and lb_type. The type would just point to the other clusters by name, in order of priority. The lb_type would logically try to lb-pick from each group in order until it got one that worked.

mattklein123 · 2017-10-26T22:24:14Z

Agreed with @ggreenway that would be a nice general solution. I like (2). It's somewhat similar/related to what @zuercher did with the subset LB.

rshriram · 2017-10-26T23:09:16Z

You mean (1)? Using metadata is 1. You could have reserved metadata keywords that define priority.

…

On Thu, Oct 26, 2017 at 6:24 PM Matt Klein ***@***.***> wrote: Agreed with @ggreenway <https://github.com/ggreenway> that would be a nice general solution. I like (2). It's somewhat similar/related to what @zuercher <https://github.com/zuercher> did with the subset LB. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1929 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qdxlSiVgX9Z9FDx_CSFLlgnympwTDks5swQaXgaJpZM4QEV8j> .

mattklein123 · 2017-10-27T03:16:36Z

Yes I guess we could use metadata for (1), though this does not work with DNS which I think is going to be quite common for this. Personally I kind of like (2) which is a wrapper cluster and a wrapper LB which operate on sub-clusters.

I think there are several ways to implement this that are fine with me.

mattklein123 · 2017-11-01T19:11:47Z

FYI having a long convo with @alyssawilk about this offline. I think we are potentially converging on doing (1) above (adding priority group per endpoint), but will update this ticket after some more discussion.

alyssawilk · 2017-11-02T14:24:04Z

Really long update over here:
#1980 (comment)

tl;dr I'll rewrite 1980 to address this use case as well :-)

surki · 2017-11-09T14:20:32Z

Sorry was on vacation for last week+.

@alyssawilk @mattklein123 I see that we are going to do priority grouping. Is this being worked on currently? If this is being implemented currently, do let me know if I could help in anyway (testing etc). If it is not being implemented/planned yet, I could help with the implementation as well (though it looks like it is being clubbed with larger re-factoring, so not sure how much of a help I can be of ...)

alyssawilk · 2017-11-09T14:24:31Z

I was planning on picking it up next week but as I'm traveling I'm realistically more likely to get something out for review the week following. If you think you could do some thing faster feel free to pick it up - just let me know!

…

On Thu, Nov 9, 2017 at 9:20 AM, Suresh Kumar ***@***.***> wrote: Sorry was on vacation for last week+. @alyssawilk <https://github.com/alyssawilk> @mattklein123 <https://github.com/mattklein123> I see that we are going to do priority grouping. Is this being worked on currently? If this is being implemented currently, do let me know if I could help in anyway (testing etc). If it is not being implemented/planned yet, I could help with the implementation as well (though it looks like it is being clubbed with larger re-factoring, so not sure how much of a help I can be of ...) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1929 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARYFvUdERCT5arV_8u4nV5yxfg3TiHXbks5s0wo6gaJpZM4QEV8j> .

surki · 2017-11-13T07:21:09Z

I will wait for your changes, but do let me know if you want me test/try out your changes earlier.

Reworking the organization of endpoints from HostSets-with-localities to a tree with PrioritySets at the top and HostSets as nodes. This allows the failover characteristics requested in #1929 as well as satisfying the defunct #1354 This will be followed up by a PR (or set of PRs) converting the load balancers over to use the new priority logic. API Changes: envoyproxy/data-plane-api#259 Risk Level: High It's a pretty big refactor and there were some subtle gotchas (like stats initially updating before the backends) Testing: Current tests passing. health checker unit tests verify all hosts are health checked across priority groups outlier detection unit tests do the same for outlier detection. upstream_impl_test.cc has unit testing basic priority set operations load stats integration test extended for priorities. Verifies stats reporting insofar as there's now way to direct load yet, but the localities show up as expected. Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk · 2017-12-06T21:46:39Z

Ok, I'm like 70% of the way done with the naive implementation which while sufficient to close this issue off, is not yet sufficient for my self-respect :-P

My plan A would be to mimic the load balancing code I'm most familiar with. It basically has a "overprovision factor" of (default 1.4) and starts spilling traffic out of a priority level when fudge_factor * host-set-health-ratio < 1. I guess we'd have to adapt this to priority levels to spill iff the sum of the health of the higher priority levels was less than 1.

Assuming you had 2 priority levels: 100% healthy backends. All traffic goes to P=0
As P=0 starts going unhealthy (80% health) there's initially spillover.
When P=0 hits ~70*% healthy, traffic starts trickling to P=1.
By the time P=0 is 60% healthy, roughly 16%** of traffic is hitting P=1
People who want faster or more gentle failover can tweak the over-provision factor accordingly.

I'd have to dig up details on the ratio drop-off when P=1 starts going unhealthy as well but I figure rather than doing it in code I'd poll for improvements / enhancements with enough time to dicker while I wrap up the naive implementation for ring hash etc.

1.4 *.7143 = 1.0002 so it flips at 71.5% healthy
** 1- 1.4 * .6 = .16

mattklein123 · 2017-12-07T14:02:33Z

@alyssawilk ^ SGTM. The only complexity is that I think it would probably break the way the zone aware balancing algorithm works in terms of spillover calculations between zones (or make it substantially more complicated). I'm honestly not sure this matters. If failover is occurring we could potentially even disable zone aware balancing. I would keep it in mind though.

Changing the LoadBalancerBase to use priorities, following the simplistic strategy docced up in the eds proto, where we route to the highest priority level with non-zero healthy endpoints. This results in simple priority-based load balancing for RoundRobinLoadBalancer, LeastRequestLoadBalancer, and RandomLoadBalancer. Risk Level: Medium It's a big change but then again no one is using it :-) Testing: Existing tests parameterized to target P=0 and P=1 to verify equivalence. Additional unit tests and one integration test explicitly testing primary/failover/... inteeractions Docs Changes: envoyproxy/data-plane-api#326 Release Notes: Added support for priorities for several types of load balancer (RoundRobinLoadBalancer, LeastRequestLoadBalancer, RandomLoadBalancer) Continued progress on #1929

alyssawilk · 2017-12-21T21:55:18Z

@surki The behavior that you asked for should work now if you want to play around.

2 more follow-up PRs but they're for gentle failover in the case there's more than one endpoint per priority set.

Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level. Risk Level: Medium Testing: Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests. Docs Changes: envoyproxy/data-plane-api#359 Release Notes: n/a: falls under existing note [Optional Fixes #Issue] Fixes #1929

Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

mattklein123 added the enhancement Feature requests. Not bugs or questions. label Oct 24, 2017

mattklein123 added the help wanted Needs help! label Oct 28, 2017

mattklein123 mentioned this issue Nov 1, 2017

Refactoring Cluster to own a HostSet rather than implementing HostSet #1980

Closed

alyssawilk self-assigned this Nov 20, 2017

alyssawilk changed the title ~~Support for new load balancer - "StandBy"~~ Add support for N levels of failover endpoints Nov 20, 2017

This was referenced Nov 20, 2017

Add support for EDS failover endpoints #1354

Closed

Adding priority levels to Envoy endpoints (not yet used for LB) #2088

Merged

mattklein123 added this to the 1.6.0 milestone Nov 26, 2017

mattklein123 removed the help wanted Needs help! label Nov 26, 2017

This was referenced Dec 4, 2017

Passing priority sets to loadbalancers (still unused for picking) #2154

Merged

Changing the base load balancer class over to use priorities #2160

Merged

This was referenced Dec 13, 2017

Moving ring hash over to supporting priorities #2206

Merged

load balancing: Disabling zone aware routing for non-zero priority levels. #2244

Merged

alyssawilk mentioned this issue Dec 21, 2017

Hierarchical EDS weighted load balancing #1285

Closed

alyssawilk mentioned this issue Jan 2, 2018

lb: gentle failover across load balancing priority levels. #2290

Merged

alyssawilk closed this as completed in #2290 Jan 4, 2018

jpsim pushed a commit that referenced this issue Nov 28, 2022

envoy: 7351734 (#1929)

e80d840

Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

jpsim pushed a commit that referenced this issue Nov 29, 2022

envoy: 7351734 (#1929)

45de4ea

Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for N levels of failover endpoints #1929

Add support for N levels of failover endpoints #1929

surki commented Oct 24, 2017

surki commented Oct 24, 2017 •

edited

Loading

mattklein123 commented Oct 24, 2017

surki commented Oct 25, 2017

mattklein123 commented Oct 25, 2017

rshriram commented Oct 26, 2017

surki commented Oct 26, 2017 via email

surki commented Oct 26, 2017 via email

mattklein123 commented Oct 26, 2017

ggreenway commented Oct 26, 2017

mattklein123 commented Oct 26, 2017

rshriram commented Oct 26, 2017 via email

mattklein123 commented Oct 27, 2017

mattklein123 commented Nov 1, 2017

alyssawilk commented Nov 2, 2017

surki commented Nov 9, 2017

alyssawilk commented Nov 9, 2017 via email

surki commented Nov 13, 2017

alyssawilk commented Dec 6, 2017

mattklein123 commented Dec 7, 2017

alyssawilk commented Dec 21, 2017

Add support for N levels of failover endpoints #1929

Add support for N levels of failover endpoints #1929

Comments

surki commented Oct 24, 2017

surki commented Oct 24, 2017 • edited Loading

mattklein123 commented Oct 24, 2017

surki commented Oct 25, 2017

mattklein123 commented Oct 25, 2017

rshriram commented Oct 26, 2017

surki commented Oct 26, 2017 via email

surki commented Oct 26, 2017 via email

mattklein123 commented Oct 26, 2017

ggreenway commented Oct 26, 2017

mattklein123 commented Oct 26, 2017

rshriram commented Oct 26, 2017 via email

mattklein123 commented Oct 27, 2017

mattklein123 commented Nov 1, 2017

alyssawilk commented Nov 2, 2017

surki commented Nov 9, 2017

alyssawilk commented Nov 9, 2017 via email

surki commented Nov 13, 2017

alyssawilk commented Dec 6, 2017

mattklein123 commented Dec 7, 2017

alyssawilk commented Dec 21, 2017

surki commented Oct 24, 2017 •

edited

Loading