external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067

shreyas-3 · 2023-11-27T08:42:59Z

What happened:
After deploying v0.13.6 version observed external-dns pos keep restarting with CrashLoopBackOff .
Whenever there is aws throttling error pod went to CrashLoopBackOff state.

Error in log :
time="2023-11-27T05:49:33Z" level=fatal msg="records retrieval failed: failed to list hosted zones: Throttling: Rate exceeded\n\tstatus code: 400

POD State when error observed
bash-4.4# kubectl get pod -n kube-system | grep -i ext
external-dns-456d8799b-1xcvv 0/1 CrashLoopBackOff 41 (71s ago) 3h48m

When there is no error in log, POD state is running.

This behaviour did not observe when external-dns:v0.11.0 was deployed. R53 Throtting was present earlier as well.

What you expected to happen:
POD should not restart or go to CrashLoopBackOff state.
POD should be in running state even if there is error due AWS R53 throttling rate exceeded.

How to reproduce it (as minimally and precisely as possible):
Pre-req : AWS r53 throttling rate exceeded error should be there . Can be generated with muliple calls to R53 in less time.
Step : Deploy external-dns v0.13.6 version and check pod state.

Anything else we need to know?:
Error observed when there was below error in logs,
time="2023-11-27T05:49:33Z" level=fatal msg="records retrieval failed: failed to list hosted zones: Throttling: Rate exceeded\n\tstatus code: 400
Pod came back to Normal running state when there was not throttling error.

Environment:

External-DNS version (use external-dns --version): 0.13.6
DNS provider: AWS Route53
Others: EKS cluster

The text was updated successfully, but these errors were encountered:

Jayd603 · 2023-11-27T14:42:20Z

I'm seeing a lot of restarts on digital ocean too with 0.14.0. Some error happens and the pod restarts. Sometimes it goes 12 or more hours without a restart. This could be related to external-dns updating records every single time even though no records need to be updated. 3977 , is your pod updating records every minute or does it say ~"All records up to date"

shreyas-3 · 2023-11-27T15:55:12Z

I'm seeing a lot of restarts on digital ocean too with 0.14.0. Some error happens and the pod restarts. Sometimes it goes 12 or more hours without a restart. This could be related to external-dns updating records every single time even though no records need to be updated. 3977 , is your pod updating records every minute or does it say ~"All records up to date"

Nope pod is not updating records every minute.
it just say all records up to date when there is no error "aws rotue53 throttling rate limit exceeded"
and pod get restarts when it observe rate limit exceeded error which is frequent

shreyas-3 · 2023-11-28T13:58:34Z

Suggestion is can we not crash the pod even if there is thorttling error ?

matthewbyrne · 2023-12-19T13:43:52Z

@shreyas-3 This has been caused by this change:
#3009

From the comments on that change, you are not alone.

We've attempted to reduce our route53 checks, and update on event, but we still regularly get pods restarting.

matthewbyrne · 2023-12-19T14:00:57Z

Looks like they've reverted the change for v0.14.0
https://github.com/olemarkus/external-dns/blob/master/controller/controller.go#L194

gregsidelinger · 2023-12-19T17:35:03Z

Looks like they've reverted the change for v0.14.0 https://github.com/olemarkus/external-dns/blob/master/controller/controller.go#L194

A PR reverting this was never submitted. You are looking at an old fork from before the patch was submitted from a branch.

Granted maybe someone wants to submit a PR to either revert or add an option to let the user decide if this should be treated as fatal. Getting rate limited from AWS should never cause a restart as far as I'm concerned.

BaudouinH · 2024-01-08T10:20:04Z

Hello, we are meeting the same issue on GCP, due not to throttling but to transient authentication errors:
{"level":"fatal","msg":"googleapi: Error 503: Authentication backend unavailable., backendError","time":"(...)"}

These are transient errors, and are bound to happen. It is a good thing that external-dns log them, but I do not think it should crash over them.

… error and not fatal Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>

szuecs · 2024-01-11T11:10:59Z

Hello, we are meeting the same issue on GCP, due not to throttling but to transient authentication errors: {"level":"fatal","msg":"googleapi: Error 503: Authentication backend unavailable., backendError","time":"(...)"}

These are transient errors, and are bound to happen. It is a good thing that external-dns log them, but I do not think it should crash over them.

Yes maybe 503 and 429 are good cases to not fail here. I am not sure if we get the status code from SDKs, likely only errors that we maybe can check.

shreyas-3 added the kind/bug Categorizes issue or PR as related to a bug. label Nov 27, 2023

gregsidelinger mentioned this issue Jan 2, 2024

Add an option to not bail on errors #4147

Closed

2 tasks

szuecs added a commit that referenced this issue Jan 9, 2024

fix: #4067, provide possibility to have a soft error mode to only log…

4664f59

… error and not fatal Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>

szuecs mentioned this issue Jan 9, 2024

fix: provide possibility to have a soft error mode #4166

Merged

k8s-ci-robot closed this as completed in #4166 Jan 15, 2024

jeanfrancoislelezec mentioned this issue Jan 22, 2024

CrashLoopBackOff during azure api throttling #4198

Closed

This was referenced Sep 13, 2024

fix(aws): add soft error blanchardma/external-dns#1

Closed

fix(aws): add soft error #4741

Merged

ninjaprox mentioned this issue Nov 19, 2024

fix(aws): add soft error #4886

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067

external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067

shreyas-3 commented Nov 27, 2023

Jayd603 commented Nov 27, 2023 •

edited

Loading

shreyas-3 commented Nov 27, 2023 •

edited

Loading

shreyas-3 commented Nov 28, 2023

matthewbyrne commented Dec 19, 2023

matthewbyrne commented Dec 19, 2023

gregsidelinger commented Dec 19, 2023

BaudouinH commented Jan 8, 2024

szuecs commented Jan 11, 2024

external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067

external-dns pod keeps restarting with aws route53 Throttling: Rate exceeded error #4067

Comments

shreyas-3 commented Nov 27, 2023

Jayd603 commented Nov 27, 2023 • edited Loading

shreyas-3 commented Nov 27, 2023 • edited Loading

shreyas-3 commented Nov 28, 2023

matthewbyrne commented Dec 19, 2023

matthewbyrne commented Dec 19, 2023

gregsidelinger commented Dec 19, 2023

BaudouinH commented Jan 8, 2024

szuecs commented Jan 11, 2024

Jayd603 commented Nov 27, 2023 •

edited

Loading

shreyas-3 commented Nov 27, 2023 •

edited

Loading