Merge pull request #45678 from alexanderConstantinescu/kep-3836

KEP-3836 documentation for 1.30
kubernetes · Mar 26, 2024 · f5ca94b · f5ca94b
2 parents b28f917 + c94d2eb
commit f5ca94b
Show file tree

Hide file tree

Showing 4 changed files with 76 additions and 0 deletions.
diff --git a/content/en/docs/concepts/services-networking/service.md b/content/en/docs/concepts/services-networking/service.md
@@ -619,6 +619,16 @@ You can integrate with [Gateway](https://gateway-api.sigs.k8s.io/) rather than S
 can define your own (provider specific) annotations on the Service that specify the equivalent detail.
 {{< /note >}}
 
+#### Node liveness impact on load balancer traffic
+
+Load balancer health checks are critical to modern applications. They are used to
+determine which server (virtual machine, or IP address) the load balancer should
+dispatch traffic to. The Kubernetes APIs do not define how health checks have to be
+implemented for Kubernetes managed load balancers, instead it's the cloud providers
+(and the people implementing integration code) who decide on the behavior. Load
+balancer health checks are extensively used within the context of supporting the
+`externalTrafficPolicy` field for Services.
+
 #### Load balancers with mixed protocol types
 
 {{< feature-state feature_gate_name="MixedProtocolLBService" >}}

diff --git a/...and-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md b/...and-line-tools-reference/feature-gates/kube-proxy-draining-terminating-nodes.md
@@ -9,6 +9,10 @@ stages:
   - stage: alpha
     defaultValue: false
     fromVersion: "1.28"
+    toVersion: "1.30"
+  - stage: beta
+    defaultValue: true
+    fromVersion: "1.30"
 ---
 Implement connection draining for
 terminating nodes for `externalTrafficPolicy: Cluster` services.
diff --git a/content/en/docs/reference/networking/ports-and-protocols.md b/content/en/docs/reference/networking/ports-and-protocols.md
@@ -27,6 +27,7 @@ etcd cluster externally or on custom ports.
 | Protocol | Direction | Port Range  | Purpose               | Used By                 |
 |----------|-----------|-------------|-----------------------|-------------------------|
 | TCP      | Inbound   | 10250       | Kubelet API           | Self, Control plane     |
+| TCP      | Inbound   | 10256       | kube-proxy            | Self, Load balancers    |
 | TCP      | Inbound   | 30000-32767 | NodePort Services†    | All                     |
 
 † Default port range for [NodePort Services](/docs/concepts/services-networking/service/).

diff --git a/content/en/docs/reference/networking/virtual-ips.md b/content/en/docs/reference/networking/virtual-ips.md
@@ -488,6 +488,67 @@ route to ready node-local endpoints. If the traffic policy is `Local` and there
 are no node-local endpoints, the kube-proxy does not forward any traffic for the
 relevant Service.
 
+If `Cluster` is specified all nodes are eligible load balancing targets _as long as_
+the node is not being deleted and kube-proxy is healthy. In this mode: load balancer 
+health checks are configured to target the service proxy's readiness port and path.
+In the case of kube-proxy this evaluates to: `${NODE_IP}:10256/healthz`. kube-proxy
+will return either an HTTP code 200 or 503. kube-proxy's load balancer health check
+endpoint returns 200 if:
+
+1. kube-proxy is healthy, meaning:
+   - it's able to progress programming the network and isn't timing out while doing
+     so (the timeout is defined to be: **2 × `iptables.syncPeriod`**); and
+2. the node is not being deleted (there is no deletion timestamp set for the Node).
+
+The reason why kube-proxy returns 503 and marks the node as not
+eligible when it's being deleted, is because kube-proxy supports connection
+draining for terminating nodes. A couple of important things occur from the point
+of view of a Kubernetes-managed load balancer when a node _is being_ / _is_ deleted.
+
+While deleting:
+
+* kube-proxy will start failing its readiness probe and essentially mark the
+   node as not eligible for load balancer traffic. The load balancer health
+   check failing causes load balancers which support connection draining to
+   allow existing connections to terminate, and block new connections from
+   establishing.
+
+When deleted:
+
+* The service controller in the Kubernetes cloud controller manager removes the
+  node from the referenced set of eligible targets. Removing any instance from
+  the load balancer's set of backend targets immediately terminates all
+  connections. This is also the reason kube-proxy first fails the health check
+  while the node is deleting.
+
+It's important to note for Kubernetes vendors that if any vendor configures the
+kube-proxy readiness probe as a liveness probe: that kube-proxy will start
+restarting continuously when a node is deleting until it has been fully deleted.
+kube-proxy exposes a `/livez` path which, as opposed to the `/healthz` one, does
+**not** consider the Node's deleting state and only its progress programming the
+network. `/livez` is therefore the recommended path for anyone looking to define
+a livenessProbe for kube-proxy.
+
+Users deploying kube-proxy can inspect both the readiness / liveness state by
+evaluating the metrics: `proxy_livez_total` / `proxy_healthz_total`. Both
+metrics publish two series, one with the 200 label and one with the 503 one.
+
+For `Local` Services: kube-proxy will return 200 if
+
+1. kube-proxy is healthy/ready, and
+2. has a local endpoint on the node in question.
+
+Node deletion does **not** have an impact on kube-proxy's return
+code for what concerns load balancer health checks. The reason for this is:
+deleting nodes could end up causing an ingress outage should all endpoints
+simultaneously be running on said nodes.
+
+The Kubernetes project recommends that cloud provider integration code
+configures load balancer health checks that target the service proxy's healthz
+port. If you are using or implementing your own virtual IP implementation,
+that people can use instead of kube-proxy, you should set up a similar health
+checking port with logic that matches the kube-proxy implementation.
+
 ### Traffic to terminating endpoints
 
 {{< feature-state for_k8s_version="v1.28" state="stable" >}}