-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add --term-delay
option to CLI
#2494
Conversation
This will make it possible to pass in a time duration to be used as a SIGTERM signal delay, which will make it possible to configure if the ingress-controller should wait before shutting down.
Context: there are cases where due to network load balancer configuration that a delay needs to be introduced to the containers, this delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress-controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation. This makes multiple changes: 1. Adjusts where the SignalHandler is called: Originally this was in the entry point to the controller manager, however in order to pass through the CLI argument and make the term-delay configurable, this needed moved down into the rootcmd package. 2. Supports adding a delay to the SignalHandler This takes the original SetupSignalHandler, and modifies the logic to support setting a time delay. It should still support overriding that timedelay by sending a second signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution, the PR is looking good!
I have a couple of comments for your considerations, let me know what you think.
This is great stuff! Instead of doing it via GH, I'll add all these to my local branch and push a change. I'll also make sure the linter runs successfully locally first (I'm using |
Addressed some PR comments on wording/docs
Just checking in? |
Hi @shaneutt, thanks for following up! We'll have the remaining updates available shortly, our team has had some vacation time which has resulted in a longer turnaround from us. 🙂 |
A second signal can come in during the termination delay, or after the `cancel()` has run and the rest of the ingress controller's graceful shutdown process has started. This will log when those take place. Note the addition of `<-c` for the second case, as we were not correctly capturing a signal here.
Apologies for the delay! Wanted to ensure I had the necessary helm chart changes tested (a PR in that repo forthcoming). We can rebase / squash these commit as well if you'd like, just let me know your preference. We believe we have this in the state that we'd like! |
There are cases where due to network load balancer (NLB) configuration that a delay needs to be introduced to the containers during shutdown. This delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation. We have a PR for the kong-ingress-controller which will [update its behaviour](Kong/kubernetes-ingress-controller#2494) so we can configure the termination delay. This PR adds the ability to set a concomitant value in the helm chart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, thank you for the contribution 👍
If you haven't done so before, please do feel free to grab yourself a free Kong contributor's t-shirt (each of you), we give these out once per contributor: https://github.com/Kong/kong/blob/master/CONTRIBUTING.md#contributor-t-shirt
Thank you for your contribution, we appreciate it 🖖
Thank you so much! Two more questions @shaneutt 🙂
|
Yes please.
Most likely this is going into |
This was introduced in Kong/kubernetes-ingress-controller#2494 and should be released as part of v2.4.0
* feat(ingress) support terminationGracePeriodSeconds There are cases where due to network load balancer (NLB) configuration that a delay needs to be introduced to the containers during shutdown. This delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation. We have a PR for the kong-ingress-controller which will [update its behaviour](Kong/kubernetes-ingress-controller#2494) so we can configure the termination delay. This PR adds the ability to set a concomitant value in the helm chart. * Add PR to CHANGELOG.md (unreleased) * Update charts/kong/CHANGELOG.md Co-authored-by: Shane Utt <shane@shaneutt.com> * Update CHANGELOG.md Co-authored-by: Shane Utt <shane@shaneutt.com>
* Copy 2.3.x directory for upcoming 2.4.x release * Documenting --term-delay parameter This was introduced in Kong/kubernetes-ingress-controller#2494 and should be released as part of v2.4.0 * Make 2.4.x latest version * remove duplicate directory for 2.4.x; move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files * set 2.4.x filter on term-delay * backticks for cli flags so they render accurately; capitalization/punctuation * Fix if_version space rendering Co-authored-by: lena.larionova <yelena.larionova@gmail.com> Co-authored-by: Michael Heap <m@michaelheap.com> Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
* feat(ingress) support terminationGracePeriodSeconds There are cases where due to network load balancer (NLB) configuration that a delay needs to be introduced to the containers during shutdown. This delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation. We have a PR for the kong-ingress-controller which will [update its behaviour](Kong/kubernetes-ingress-controller#2494) so we can configure the termination delay. This PR adds the ability to set a concomitant value in the helm chart. * Add PR to CHANGELOG.md (unreleased) * Update charts/kong/CHANGELOG.md Co-authored-by: Shane Utt <shane@shaneutt.com> * Update CHANGELOG.md Co-authored-by: Shane Utt <shane@shaneutt.com>
* Copy 2.3.x directory for upcoming 2.4.x release * Documenting --term-delay parameter This was introduced in Kong/kubernetes-ingress-controller#2494 and should be released as part of v2.4.0 * Make 2.4.x latest version * remove duplicate directory for 2.4.x; move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files * set 2.4.x filter on term-delay * backticks for cli flags so they render accurately; capitalization/punctuation * Fix if_version space rendering Co-authored-by: lena.larionova <yelena.larionova@gmail.com> Co-authored-by: Michael Heap <m@michaelheap.com> Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
* Copy 2.3.x directory for upcoming 2.4.x release * Documenting --term-delay parameter This was introduced in Kong/kubernetes-ingress-controller#2494 and should be released as part of v2.4.0 * Make 2.4.x latest version * remove duplicate directory for 2.4.x; move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files * set 2.4.x filter on term-delay * backticks for cli flags so they render accurately; capitalization/punctuation * Fix if_version space rendering Co-authored-by: lena.larionova <yelena.larionova@gmail.com> Co-authored-by: Michael Heap <m@michaelheap.com> Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
What this PR does / why we need it:
This PR adds a
--term-delay
option to the CLI for the ingress-controller.Context:
There are cases where due to network load balancer (NLB) configuration that a delay needs to be introduced to the containers, this delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress-controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation.
With the ingress-controller gone, any changes to the underlying services that the proxy is routing to would be missed. Due to limitations with AWS NLB deregistration delays, we've had to configure a sleep of up to 4 minutes for the proxy container before shutting down. This currently means when we are scaling down, or rolling Kong pods there is a 3-4 minute window where the proxy container is running with no ingress-controller to monitor for updates with our underlying services.
Impact: This can result in stale upstreams, and errors for clients during this window. 😞
This PR makes multiple changes:
Adds a new
--term-delay
argument to the CLIThis makes it possible for users to configure the SIGTERM delay value by passing a CLI argument, or an environment variable. The default is set to zero so it won't impact existing users.
Adjusts where the SignalHandler is called:
Originally this was in the entry point to the controller manager, however in order to pass through the CLI argument and make the term-delay configurable, this needed moved down into the rootcmd package.
Supports adding a delay to the SignalHandler
This takes the original SetupSignalHandler, and modifies the logic to support setting a time delay. It should still support overriding that time-delay by sending a second signal. This involved copying and altering the existing k8s controller-runtime signal package.
Special notes for your reviewer:
Alternative option considered:
kong.lifecycle
rather thankong.proxy.lifecycle
. It would also require a sleep binary in the proxy container which doesn’t currently exist from the distroless/static base image - we could mount one from an initContainer but it would be fiddly.Testing
I've tested this locally, running with the following commands:
From this I've been able to exercise trying to CTRL + C to kill the process, and validated that the delay is working as expected. I've also validated that by sending a second signal it does exit with code 1.
PR Readiness Checklist:
Complete these before marking the PR as
ready to review
:CHANGELOG.md
release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR