Implement failover load balancing strategy #46

donovanmuller · 2020-02-24T19:33:29Z

As per the supported load balancing strategies in the initial design a failover strategy should be implemented to ensure the guarantees stated:

Failover - Pinned to a specified primary cluster until that cluster has no available Pods, upon which the next available cluster's Ingress node IPs will be resolved. When Pods are again available on the primary cluster, the primary cluster will once again be the only eligible cluster for which cluster Ingress node IPs will be resolved

Scenario 1:

Given 2 separate Kubernetes clusters, X, and Y
Each cluster has a healthy Deployment with a backend Service called app and that backend service exposed with a Gslb resource on all 2 clusters as:

apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
  name: app-gslb
  namespace: test-gslb
spec:
  ingress:
    rules:
      - host: app.cloud.example.com
        http:
          paths:
            - backend:
                serviceName: app
                servicePort: http
              path: /
  strategy: failover 
    primary: cluster-x

Each cluster has one worker node that accepts Ingress traffic. The worker node in each cluster has the following name and IP:

cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11

When issuing the following command, curl -v http://app.cloud.example.com, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):

$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 3
*   Trying 10.0.1.10...
...

The resolved node IP's that ingress traffic will be sent should be "pinned" to the primary cluster named explicitly in the Gslb resource above, even though there was a healthy Deployment in cluster Y, the Ingress node IPs for cluster Y would not be resolved.

Scenario 2:

Same configuration as Scenario 1 except that the Deployment only has healthy Pods on one cluster, cluster Y. I.e. The Deployment on cluster X has no healthy Pods.

When issuing the following command, curl -v http://app.cloud.example.com, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):

$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.1.1.11...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.1.1.11...
...

$ curl -v http://app.cloud.example.com # execution 3
*   Trying 10.1.1.11...
...

In this scenario, only Ingress node IPs for cluster Y are resolved given that there is not a healthy Deployment for the Gslb host on the primary cluster, cluster X. Therefore, the "failover" cluster(s) are resolved instead (cluster Y in this scenario).

Now, given that the Deployment on cluster X (the primary cluster) now becomes healthy once again, I would expect the IP's resolved to reflect as follows (if this command was executed 2 times consecutively):

$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.0.1.10...
...

The primary cluster's Ingress node IPs are now resolved exclusively once again.

NOTE:

The design of the specification around how to indicate the primary cluster as described in this issue is solely for the purpose of describing the scenario. It should not be considered a design.
The existence of multiple "secondary" failover clusters should also be considered. For example, if there were 3 clusters (X, Y and Z) in the scenario 2 above, could the Ingress node IPs for both clusters (X and Z) be resolved and if so, how (in terms of "load balancing") would the Ingress node IPs across both those secondary/failover clusters be resolved? Would they use the default round robin strategy, if any strategy at all?

The text was updated successfully, but these errors were encountered:

* Extends `Strategy` CRD Spec * Implements simple failover logic with respect to `PrimaryGeoTag` * Associated test suite extension * Resolves #46

donovanmuller added the enhancement New feature or request label Feb 24, 2020

donovanmuller added this to the 0.6 milestone Feb 24, 2020

ytsarev added a commit that referenced this issue Mar 17, 2020

Failover strategy implementation

5a48227

* Extends `Strategy` CRD Spec * Implements simple failover logic with respect to `PrimaryGeoTag` * Associated test suite extension * Resolves #46

ytsarev mentioned this issue Mar 17, 2020

Failover strategy implementation #65

Merged

ytsarev closed this as completed in #65 Mar 18, 2020

ytsarev added a commit that referenced this issue Mar 18, 2020

Failover strategy implementation (#65)

38f7284

* Extends `Strategy` CRD Spec * Implements simple failover logic with respect to `PrimaryGeoTag` * Associated test suite extension * Resolves #46

donovanmuller mentioned this issue Mar 20, 2020

When using the failover load balancing strategy, investigate and validate how resolution will be handled effectively when clusters are configured for mutual failover #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement failover load balancing strategy #46

Implement failover load balancing strategy #46

donovanmuller commented Feb 24, 2020

Implement failover load balancing strategy #46

Implement failover load balancing strategy #46

Comments

donovanmuller commented Feb 24, 2020