-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067
Comments
I'm seeing a lot of restarts on digital ocean too with 0.14.0. Some error happens and the pod restarts. Sometimes it goes 12 or more hours without a restart. This could be related to external-dns updating records every single time even though no records need to be updated. 3977 , is your pod updating records every minute or does it say ~"All records up to date" |
Nope pod is not updating records every minute. |
Suggestion is can we not crash the pod even if there is thorttling error ? |
@shreyas-3 This has been caused by this change: From the comments on that change, you are not alone. We've attempted to reduce our route53 checks, and update on event, but we still regularly get pods restarting. |
Looks like they've reverted the change for v0.14.0 |
A PR reverting this was never submitted. You are looking at an old fork from before the patch was submitted from a branch. Granted maybe someone wants to submit a PR to either revert or add an option to let the user decide if this should be treated as fatal. Getting rate limited from AWS should never cause a restart as far as I'm concerned. |
Hello, we are meeting the same issue on GCP, due not to throttling but to transient authentication errors: These are transient errors, and are bound to happen. It is a good thing that external-dns log them, but I do not think it should crash over them. |
… error and not fatal Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Yes maybe 503 and 429 are good cases to not fail here. I am not sure if we get the status code from SDKs, likely only errors that we maybe can check. |
What happened:
After deploying v0.13.6 version observed external-dns pos keep restarting with CrashLoopBackOff .
Whenever there is aws throttling error pod went to CrashLoopBackOff state.
Error in log :
time="2023-11-27T05:49:33Z" level=fatal msg="records retrieval failed: failed to list hosted zones: Throttling: Rate exceeded\n\tstatus code: 400
POD State when error observed
bash-4.4# kubectl get pod -n kube-system | grep -i ext
external-dns-456d8799b-1xcvv 0/1 CrashLoopBackOff 41 (71s ago) 3h48m
When there is no error in log, POD state is running.
This behaviour did not observe when external-dns:v0.11.0 was deployed. R53 Throtting was present earlier as well.
What you expected to happen:
POD should not restart or go to CrashLoopBackOff state.
POD should be in running state even if there is error due AWS R53 throttling rate exceeded.
How to reproduce it (as minimally and precisely as possible):
Pre-req : AWS r53 throttling rate exceeded error should be there . Can be generated with muliple calls to R53 in less time.
Step : Deploy external-dns v0.13.6 version and check pod state.
Anything else we need to know?:
Error observed when there was below error in logs,
time="2023-11-27T05:49:33Z" level=fatal msg="records retrieval failed: failed to list hosted zones: Throttling: Rate exceeded\n\tstatus code: 400
Pod came back to Normal running state when there was not throttling error.
Environment:
external-dns --version
): 0.13.6The text was updated successfully, but these errors were encountered: