-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
React on pod events for readiness gates #1214
React on pod events for readiness gates #1214
Conversation
Hi @bpineau. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor improvement
continue | ||
} | ||
|
||
if h.isPodInEndpoint(pod, endpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to only reconcile the ingress if the ContainersReady
condition is true to avoid unnecessary reconciliation iterations (and thus many AWS API calls) on frequent pod status updates during pod startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devkid the point of reconciliations IMO is that you don't have to rely on state comparison to make a decision.
In our test we didn't had any issues yet with AWS API limits.
Also we shouldn't rely on the fact that right now only containers ready is the important and driving factor for registering pods in the endpoints table. Upstream k8s could change or extend that logic any time and it would break the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @nirnanaaa, that would specialise the watch by embedding expectations on implementations details of an unrelated part in the code base (which might change); eg. setting a trap for futur maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nirnanaaa
I think we need to have the ContainerReady check. we have the check to decide whether to register the Pod to ALB anyway. Under current architecture, the reconcile will call AWS APIs to check every ALB settings before reconcile targetGroups, which is inefficient and cause troubles if there are many ingresses.
I think it's better to have below changes:
- have a common function defined in backends package like "IsPodSuitableAsIPTarget", which checks containersReady
- change https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/master/internal/ingress/backend/endpoint.go#L186 to call the function above for decide whether pod in NotReadyAddresses should be registered.
- only enqueue Ingress for pod Update event when
(IsPodSuitableAsIPTarget(podNew) != IsPodSuitableAsIPTarget(podOld)) && PodInEndpoint(podNew)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@M00nF1sh does that really justify adding all the complex logic to have oldRef stored, the handling if no old ref was found in store etc ? I don't think the comparison would be a huge deal other than duplicating logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@M00nF1sh pushed a commit doing exactly that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, ready to go :D, just one nit, we need a "break" after queue.Add :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added the break
(optimize the loop but precludes pods being part of several readiness gate backed ingresses - which I guess was a shady corner case wrt readiness gates semantics anyway?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpineau The break would only quit the loop that iterates over the backends, i.e. all other ingresses would still be processed. Enqueuing the ingress for reconciliation will reconcile all target groups anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devkid you're right
We're relying on endpoints events to re-trigger reconciliations during rollouts, and we're considering pod's containers status (eg. are all pod's containers ContainerReady?) to either act upon, or swallow those events. Both (pods changes, ep changes) can be out of sync. For instance: a starting pod whose containers aren't all ContainerReady might have its addresses registered to an ep subset's NotReadyAddresses, kicking a reconcile event which won't propagate to the TargetGroups (+ conditions updates) since the pod is evaluated as not ready. Further pod changes won't kick an endpoint change (due to readiness gates, the pod's address will stay in NotReadyAddresses until we do something). As probably seen in #1205 . In order to react reliably on pods changes, we have to hook in a pod watch. Doing so is slightly expensive as we have to map pod -> [service ->] endpoint -> ingress on pods events, though we limit the search to pod's ns (services can only reference pods from their own ns, and ingress services from their ns).
e5d0d54
to
2872d85
Compare
12f1b2b
to
a13750a
Compare
/ok-to-test |
a13750a
to
1f8d7be
Compare
internal/ingress/backend/endpoint.go
Outdated
func IsPodSuitableAsIPTarget(pod *corev1.Pod) bool { | ||
for _, condition := range pod.Status.Conditions { | ||
if condition.Type == api.ContainersReady { | ||
if condition.Status == api.ConditionTrue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be return condition.Status == api.ConditionTrue
to quit whenever the ContainersReady
condition was found.
Addressing PR #1214 comments: bubbling up all (ingress backing) pods events to main reconcile loop might trigger an excessive AWS API calls volume. Let's keep that pod watch noise to the minimum requiered for readiness gates: catching changes in `ContainerReady`.
1f8d7be
to
2cf2c2a
Compare
really awesome work :D, thanks sooo much!! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bpineau, M00nF1sh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Addressing PR kubernetes-sigs#1214 comments: bubbling up all (ingress backing) pods events to main reconcile loop might trigger an excessive AWS API calls volume. Let's keep that pod watch noise to the minimum requiered for readiness gates: catching changes in `ContainerReady`.
…-gates-events React on pod events for readiness gates
We're relying on endpoints events to re-trigger reconciliations during
rollouts, and we're considering pod's containers status (eg. are all pod's
containers ContainerReady?) to either act upon, or swallow those events.
Both (pods changes, ep changes) can be out of sync. For instance: a starting
pod whose containers aren't all ContainerReady might have its addresses
registered to an ep subset's NotReadyAddresses, kicking a reconcile event
which won't propagate to the TargetGroups (+ conditions updates) since the pod
is evaluated as not ready. Further pod changes won't kick an endpoint change
(due to readiness gates the pod's address will stay in NotReadyAddresses
until we do something). As probably seen in #1205 .
In order to react reliably on pods changes, we have to hook in a pod watch.
Doing so is slightly expensive as we have to map pod -> [service ->]
endpoint -> ingress on pods events, though we limit the search to pod's ns
(services can only reference pods from their own ns, and ingress services from
their ns).
We might optimize that by adding a pod indexer (if supported by controller-runtime).