Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for N levels of failover endpoints #1929

Closed
surki opened this issue Oct 24, 2017 · 20 comments · Fixed by #2290
Closed

Add support for N levels of failover endpoints #1929

surki opened this issue Oct 24, 2017 · 20 comments · Fixed by #2290
Assignees
Labels
enhancement Feature requests. Not bugs or questions.
Milestone

Comments

@surki
Copy link
Contributor

surki commented Oct 24, 2017

I would like to use Envoy to load balance long lived TCP connections under the following setup:
Say, I have 2 machines m1 and m2. I would like all traffic to go to m1. The traffic should go to m2 only if m1 fails (either active/passive healthcheck) Or if it exceeds certain number of connections defined per host.

This is similar to the "first" load balancer in HAProxy: https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#balance

@surki
Copy link
Contributor Author

surki commented Oct 24, 2017

So I did quick PoC to see what it would take to add this feature, here are the changes:

Envoy changes: master...surki:load_balancer_standby
Envoy-api changes: envoyproxy/data-plane-api@master...surki:load_balancer_standby

and corresponding config file:
(it is a proxy for mysql for read only traffic, it sends read only queries to "slave", if "slave" fails/reaches capacity, it will redirect the connections to "master". Also it does an active health check by sending "auth packet" (with username "proxyhealth") and an immediate "quit command packet" using mysql protocol 4.1)

{
  "listeners": [{
    "address": "tcp://127.0.0.1:33060",
    "filters": [{
      "type": "read",
      "name": "tcp_proxy",
      "config": {
        "stat_prefix": "mysql",
        "route_config": {
          "routes": [{
            "cluster": "mysql"
          }]
        }
      }
    }]
  }],
  "admin": {
    "access_log_path": "/tmp/envoy.log",
    "address": "tcp://127.0.0.1:9901"
  },
  "cluster_manager": {
    "clusters": [{
      "name": "mysql",
      "connect_timeout_ms": 2500,
      "type": "strict_dns",
      "lb_type": "standby",
      "health_check": {
        "type": "tcp",
        "timeout_ms": 2000,
        "interval_ms": 10000,
        "unhealthy_threshold": 2,
        "healthy_threshold": 2,
        "service_name": "mysql",
        "send": [
          {"binary": "2d0000"},
          {"binary": "01"},
          {"binary": "00820000"},
          {"binary": "00000001"},
          {"binary": "21"},
          {"binary": "0000000000000000000000000000000000000000000000"},
          {"binary": "70726f78796865616c74680000"},
          {"binary": "010000"},
          {"binary": "00"},
          {"binary": "01"}
        ],
        "receive": [
          {"binary": "07000002000000"}
        ]
      },
      "hosts": [{
        "url": "tcp://mysql_slave:3306",
        "id": 0,
        "max_connections": 2048
      }, {
        "url": "tcp://mysql_master:3306",
        "id": 1,
        "max_connections": 1024
      }]
    }]
  }
}

Not sure if above changes are inline with general Envoy architecture/good practice, few questions:

  • Looks like currently supported load balancing algorithms implicitly assume http requests. The above change is specific to TCP Proxy, though I don't see any reason why it won't be applicable to http requests. I couldn't find anywhere to tell the load balancer algorithm to act on "request" or "connections", short of putting the type in algorithm name itself.

  • I am not sure what is the best way to propagate "id" and "max_connections" all the way to individual host entries available in "LoadBalancer::chooseHost()".

  • API v2 (i.e., protobuf3) vs JSON v1: I am not sure if adding things into JSON is allowed now. Should I just stick to the protobuf3 changes alone and stop using current .json config format?

  • There may be multiple resolved host IPs in LoadBalancer::chooseHost() for each host specified in the config file (multiple A/AAAA records + DNS type we use for cluster etc), so probably the better way is to apply max_connections to each resolved IP independently.

PS: We are also planning to add MySQL filter for exposing interesting stats (just like other filters like DynamoDB etc does). I see that there is an experimental Lua support, probably I may try out that for MySQL protocol parsing/stats extraction.

@mattklein123 mattklein123 added the enhancement Feature requests. Not bugs or questions. label Oct 24, 2017
@mattklein123
Copy link
Member

Some quick questions:

  1. Why is ID needed?
  2. Do we really need max connections on a per host basis? Or can it be the same for all hosts in a cluster?

@surki
Copy link
Contributor Author

surki commented Oct 25, 2017

ID: Need a way to deterministically pick up the host from first to last. Since the DNS resolving happens async for all hosts concurrently, the values populated in LoadBalancerBase::host_set_ will have random order. Any other way to achieve the order? (I need to choose hosts in the exact order specified in the json config file)

max_connections per host basis: I thought it will be more flexible since different hosts can have different througput/capacity etc. We can move max_connections out to cluster level if you think that is something we will not need host level (personally I do not need per host level, though I could see usecase for others).

@mattklein123
Copy link
Member

I think there is a general question here as to whether this should be a new load balancing policy, or actually built directly into the tcp_proxy code. For example, one could imagine adding the concept of a fallback cluster to tcp_proxy routing, and then having rules on when the fallback is actually used. I think @rshriram and @ggreenway had some thoughts on this. In some ways it fits a bit better in this way vs. a discrete balancing policy.

If we go with the LB built-in approach, I understand the need for ID. It seems unfortunate to have to add this to the Address structure, but I'm not sure how else to deal with this without doing a breaking change which we can't do at this point. @htuch any thoughts here?

Re: max connections per host, unless there is a specific need I would probably add this as a property of the cluster where we already have max_requests_per_connection.

I would like to get some additional thoughts on whether we should do that is a new LB type vs. building into tcp_proxy. I think I am leanings towards doing what you have done here but would like some other opinions.

@rshriram
Copy link
Member

I wish we could somehow use the max connections settings from the circuit breaker. This use case seems a bit niche, to create a special load balancer (if there are other scenarios where this bin packing style LB is needed, I would love to hear them).

Even in this use case, what would you do if you have multiple slaves? It seems that the only requirement here is to use the master after slaves are filled up. And among the slaves, it doesn’t matter how connections are assigned. Is this assumption correct? If so, we could imagine creating some form of fallback cluster option that contains MySQL master and the slaves are in the main cluster. The fallback option seems like it would get more mileage across multiple use cases. You can set the circuit breaker in the main cluster to be the total number of connections acceptable in the slaves. And failing that, the fallback cluster will be picked.

For the MySQL filter, i suggest doing it in c++, for performance reasons and the fact that lua filters for tcp are not here yet. You could take a look at the mongo filter for reference.

Another idea to satisfy the particular MySQL use case you have (assuming typical MySQL installations have same needs): create the MySQL filter very much like redis filter. You can have first class support for master and slave cluster, in the filter. You will have control over when to fallback to the master.

@surki
Copy link
Contributor Author

surki commented Oct 26, 2017 via email

@surki
Copy link
Contributor Author

surki commented Oct 26, 2017 via email

@mattklein123
Copy link
Member

If you don't need max connections lets just drop that. Also, I just looked at the code and the hosts actually should remain ordered for static/strict_dns clusters. For strict_dns this is truly only if DNS for each "host" returns a single IP, which I'm guessing is your use case: https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L483

Assuming the above is OK for you, I think this is pretty simple. Basically just write a new LB which assumes hosts are in order, and just always sends to first, if failed, sends to next, etc. Would that work?

Agree, C++ could be the right choice. I was not sure how easy it is to
maintain an out of tree Envoy filter

If you are going to do a MySQL filter and are willing to go through code review process, etc. please just contribute it! I think there are many people that would love a MySQL filter.

@ggreenway
Copy link
Contributor

The more general solution is priority groups (at least that's what f5 called it).

Basically, you have multiple groups/pools of machines, and the groups are priority ordered. So you try to pick a host from the first group; if that fails, try from the next group, etc.

To represent that in envoy, I think it makes more sense to put it in the cluster/lb side than in the tcp_proxy. This could also be applicable to http easily enough.

I see two ways to doing it that could make sense:

  1. New LB type on clusters for this mode, and have some way for each host in the cluster of annotating which priority group it is in.

  2. Create a cluster for each priority group, which could use any LB method and is one of the existing types. Then create another cluster of a new type and lb_type. The type would just point to the other clusters by name, in order of priority. The lb_type would logically try to lb-pick from each group in order until it got one that worked.

@mattklein123
Copy link
Member

Agreed with @ggreenway that would be a nice general solution. I like (2). It's somewhat similar/related to what @zuercher did with the subset LB.

@rshriram
Copy link
Member

rshriram commented Oct 26, 2017 via email

@mattklein123
Copy link
Member

Yes I guess we could use metadata for (1), though this does not work with DNS which I think is going to be quite common for this. Personally I kind of like (2) which is a wrapper cluster and a wrapper LB which operate on sub-clusters.

I think there are several ways to implement this that are fine with me.

@mattklein123 mattklein123 added the help wanted Needs help! label Oct 28, 2017
@mattklein123
Copy link
Member

FYI having a long convo with @alyssawilk about this offline. I think we are potentially converging on doing (1) above (adding priority group per endpoint), but will update this ticket after some more discussion.

@alyssawilk
Copy link
Contributor

Really long update over here:
#1980 (comment)

tl;dr I'll rewrite 1980 to address this use case as well :-)

@surki
Copy link
Contributor Author

surki commented Nov 9, 2017

Sorry was on vacation for last week+.

@alyssawilk @mattklein123 I see that we are going to do priority grouping. Is this being worked on currently? If this is being implemented currently, do let me know if I could help in anyway (testing etc). If it is not being implemented/planned yet, I could help with the implementation as well (though it looks like it is being clubbed with larger re-factoring, so not sure how much of a help I can be of ...)

@alyssawilk
Copy link
Contributor

alyssawilk commented Nov 9, 2017 via email

@surki
Copy link
Contributor Author

surki commented Nov 13, 2017

I will wait for your changes, but do let me know if you want me test/try out your changes earlier.

@alyssawilk alyssawilk self-assigned this Nov 20, 2017
@alyssawilk alyssawilk changed the title Support for new load balancer - "StandBy" Add support for N levels of failover endpoints Nov 20, 2017
@mattklein123 mattklein123 added this to the 1.6.0 milestone Nov 26, 2017
@mattklein123 mattklein123 removed the help wanted Needs help! label Nov 26, 2017
htuch pushed a commit that referenced this issue Nov 30, 2017
Reworking the organization of endpoints from HostSets-with-localities to a tree with PrioritySets at the top and HostSets as nodes.
This allows the failover characteristics requested in #1929 as well as satisfying the defunct #1354
This will be followed up by a PR (or set of PRs) converting the load balancers over to use the new priority logic.

API Changes:
envoyproxy/data-plane-api#259

Risk Level: High
It's a pretty big refactor and there were some subtle gotchas (like stats initially updating before the backends)

Testing:
Current tests passing.
health checker unit tests verify all hosts are health checked across priority groups
outlier detection unit tests do the same for outlier detection.
upstream_impl_test.cc has unit testing basic priority set operations
load stats integration test extended for priorities. Verifies stats reporting insofar as there's now way to direct load yet, but the localities show up as expected.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor

Ok, I'm like 70% of the way done with the naive implementation which while sufficient to close this issue off, is not yet sufficient for my self-respect :-P

My plan A would be to mimic the load balancing code I'm most familiar with. It basically has a "overprovision factor" of (default 1.4) and starts spilling traffic out of a priority level when fudge_factor * host-set-health-ratio < 1. I guess we'd have to adapt this to priority levels to spill iff the sum of the health of the higher priority levels was less than 1.

Assuming you had 2 priority levels: 100% healthy backends. All traffic goes to P=0
As P=0 starts going unhealthy (80% health) there's initially spillover.
When P=0 hits ~70*% healthy, traffic starts trickling to P=1.
By the time P=0 is 60% healthy, roughly 16%** of traffic is hitting P=1
People who want faster or more gentle failover can tweak the over-provision factor accordingly.

I'd have to dig up details on the ratio drop-off when P=1 starts going unhealthy as well but I figure rather than doing it in code I'd poll for improvements / enhancements with enough time to dicker while I wrap up the naive implementation for ring hash etc.

  • 1.4 *.7143 = 1.0002 so it flips at 71.5% healthy
    ** 1- 1.4 * .6 = .16

@mattklein123
Copy link
Member

@alyssawilk ^ SGTM. The only complexity is that I think it would probably break the way the zone aware balancing algorithm works in terms of spillover calculations between zones (or make it substantially more complicated). I'm honestly not sure this matters. If failover is occurring we could potentially even disable zone aware balancing. I would keep it in mind though.

alyssawilk added a commit that referenced this issue Dec 13, 2017
Changing the LoadBalancerBase to use priorities, following the simplistic strategy docced up in the eds proto, where we route to the highest priority level with non-zero healthy endpoints.

This results in simple priority-based load balancing for RoundRobinLoadBalancer, LeastRequestLoadBalancer, and RandomLoadBalancer.

Risk Level: Medium
It's a big change but then again no one is using it :-)

Testing:
Existing tests parameterized to target P=0 and P=1 to verify equivalence.
Additional unit tests and one integration test explicitly testing primary/failover/... inteeractions

Docs Changes:
envoyproxy/data-plane-api#326

Release Notes:
Added support for priorities for several types of load balancer (RoundRobinLoadBalancer, LeastRequestLoadBalancer, RandomLoadBalancer)

Continued progress on #1929
@alyssawilk
Copy link
Contributor

@surki The behavior that you asked for should work now if you want to play around.

2 more follow-up PRs but they're for gentle failover in the case there's more than one endpoint per priority set.

alyssawilk added a commit that referenced this issue Jan 4, 2018
Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level.

Risk Level: Medium

Testing:
Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests.

Docs Changes:
envoyproxy/data-plane-api#359

Release Notes: n/a: falls under existing note

[Optional Fixes #Issue]
Fixes #1929
jpsim pushed a commit that referenced this issue Nov 28, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants