KIC looping forever on Context Deadline exceeded #2440

aroundthecode · 2022-04-26T18:12:18Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I've a kong 2.7.1 + KIC 2.2.1 setup con GKE
I've ~50 namespace with 2 ingress resources each resulting in ~30 rules per namespaces so a total amount of ~1500 rules.
Some services aslo has plugins configured.

Once deployed KIC, starts to populate kong, but it periopdicaly fails on error such:
{"error":"2 errors occurred: failed to sync all entities: context deadline exceeded while processing event: {Update} upstream *******.8080.svc failed: making HTTP request: Put "http://kong-kong-admin.kong-system:8001/upstreams/1e7b3b64-fdbf-4f51-9c9e-2cb43172d92b": context deadline exceeded ", "level":"error", "msg":"could not update kong admin", "subsystem":"proxy-cache-resolver"}

on kong admin side matching error is:
[error] 1097#0: *54395 [lua] events.lua:364: post_local(): worker-events: dropping event; waiting for event data timed out

I've split my deployment in 3 standalone kong proxy/admin pods + 3 KIC pods using the kong services to spread load across the 3 admin nodes.

I've also tried to scale up kong pods up to 5 and reduce KIC concurrency to 1 but still error persists (the number in error log is usually concurrency number+1 )

One KIC enters such error it continues looping on different services names flooding admin with requests.

Kong pods have no resources limit and each kong pod is allocated on a different VM, both CPU and RAM are NOT saturated.

Is there any other configuration tuning I can apply to avoid such loop?

Expected Behavior

KIC should be able to send all configuration to kong and stop looping

Steps To Reproduce

No response

Kong Ingress Controller version

No response

Kubernetes version

No response

Anything else?

both kong and KIC are deployed via helm with such setup (concurrency has been lowered up to 1):

ingressController:
  installCRDs: false
  enabled: true
  env:
    kong_admin_tls_skip_verify: true
    kong_admin_url: "http://kong-kong-admin.kong-system:8001"
    publish_service: kong-system/kong-kong-proxy
    kong_admin_concurrency: 3
    log_format: json
    log_level: warn

and such custom setting in nginx template:

proxy_buffer_size   128k;
    proxy_buffers   4 256k;
    proxy_busy_buffers_size 256k;

such setups also suffers from #2422, I'm waiting for next release to upgrade to 2.8

The text was updated successfully, but these errors were encountered:

aroundthecode · 2022-04-27T12:06:52Z

Adding some detail: KIC seem so be looping due to some evaluated diff on upstream which are default based (since I didn't configured them)

Note the minus sign in the trace below

updating upstream ***********-8e2f7.8080.svc  {
   "algorithm": "round-robin",
   "hash_fallback": "none",
   "hash_on": "none",
   "hash_on_cookie_path": "/",
   "healthchecks": {
     "active": {
       "concurrency": 10,
       "healthy": {
         "http_statuses": [
           200,
           302
         ],
         "interval": 0,
         "successes": 0
       },
       "http_path": "/",
-      "https_verify_certificate": true,
       "timeout": 1,
       "type": "http",
       "unhealthy": {
         "http_failures": 0,
         "http_statuses": [
           429,
           404,
           500,
           501,
           502,
           503,
           504,
           505
         ],
         "interval": 0,
         "tcp_failures": 0,
         "timeouts": 0
       }
     },
     "passive": {
       "healthy": {
         "http_statuses": [
           200,
           201,
           202,
           203,
           204,
           205,
           206,
           207,
           208,
           226,
           300,
           301,
           302,
           303,
           304,
           305,
           306,
           307,
           308
         ],
         "successes": 0
       },
-      "type": "http",
       "unhealthy": {
         "http_failures": 0,
         "http_statuses": [
           429,
           500,
           503
         ],
         "tcp_failures": 0,
         "timeouts": 0
       }
     },
-    "threshold": 0
   },
   "id": "1bdd4aa9-ab7c-45ff-a423-c626a00eb29f",
   "name": "console-webapp.pzucchett-8e2f7.8080.svc",
   "slots": 10000,
   "tags": [
     "managed-by-ingress-controller"
   ]
 }

Once I add a KongIngress in order to add the "missing" values the loop on the upstream ended and started looping on Services items with same "diff" logic:

updating service **********.pnum-8080  {
   "connect_timeout": 60000,
-  "enabled": true,
   "host": "**********-fa3dd.8080.svc",
   "id": "413fe3bf-dd7f-4e6f-b315-5110184c0f4d",
   "name": "**********.pnum-8080",
   "path": "/api",
   "port": 80,
   "protocol": "http",
   "read_timeout": 60000,
   "retries": 5,
   "tags": [
     "managed-by-ingress-controller"
   ],
   "write_timeout": 60000
 }

aroundthecode · 2022-04-27T16:35:50Z

Hi there, I manage to update to kong 2.8.1 + KIC 2.3.1 and all loops/traces totally disappear.
No changes performed on ingress nor service configuration.

After a week of struggling I really think that kong 2.7.* + KIC 2.2.* should be considered an unstable release and removed from the available downloads!

rainest · 2022-04-29T19:15:03Z

Increasing CONTROLLER_PROXY_TIMEOUT_SECONDS to a value higher than the default (10) can avoid that error if it's happening under normal circumstances. Best guess is that the database backing this instance wasn't able to handle the amount of load generated by the upstream thrashing, which is fixed in 2.3.

We do not remove older images because doing so would break instances that use them unexpectedly if, say, the scheduler tried to start a replica on a worker without a cached copy of that image. Not all users are equally affected by all issues. Many did run fine despite the unnecessary updates--those were a fairly long-standing issue that we only recently acquired the means to fix effectively.

aroundthecode added the bug Something isn't working label Apr 26, 2022

rainest closed this as completed Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KIC looping forever on Context Deadline exceeded #2440

KIC looping forever on Context Deadline exceeded #2440

aroundthecode commented Apr 26, 2022

aroundthecode commented Apr 27, 2022

aroundthecode commented Apr 27, 2022

rainest commented Apr 29, 2022

KIC looping forever on Context Deadline exceeded #2440

KIC looping forever on Context Deadline exceeded #2440

Comments

aroundthecode commented Apr 26, 2022

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Kong Ingress Controller version

Kubernetes version

Anything else?

aroundthecode commented Apr 27, 2022

aroundthecode commented Apr 27, 2022

rainest commented Apr 29, 2022