-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After updating deployment, controller failed to sync pod to kong #1025
Comments
What do you see in the proxy container access logs? Can you confirm there's a The error the controller gets back won't indicate what wasn't found, but 404s in the access log will. What was the deployment update? Did you update the Kong image version from 2.1.x to 2.2.x? We received a similar report following a 2.1 to 2.2 upgrade elsewhere, but still aren't sure of the root cause: the apparent successful creation/update of the upstream should mean that the target POSTs after succeed. |
We are seeing similar behaviour as reported. One of our Kong Ingress Controller pods fails with
There are no logged PUT requests in the |
@Starefossen your issue is something different. The original issue can only happen in DB-backed mode, where the requests to add upstreams and targets are separate. When the issue occurs, the target request fails, because the upstream is (supposedly) not present, even though the request to add it succeeded. There's no such separation in DB-less mode, where the |
Updated March 22 with the root causes and how to solve it (at least in our case) We are having a similar issue really often. After a re deployment, Ingress controller is unable to post to kong the target IPs to one specific upstream (different each time). This leads ingress controller to a loop trying to resync, and sometimes even controller stops to log and to update IPs for all services/pods, requiring us to restart kong pods. Kong Ingress controller version: 1.1 We also see the same issue in on premises env: Kubernetes version: 1.19, Rancher RKE, Ubuntu 20.04, docker 19, postgresql 11. What happened Causes and solution (at least in our case)
This is the cause and solution for the '{Create} failed' / post 404 issue. But still unknown why ingress controller when kept a long time in resync loop (failing with post 404), sometimes, stopped to log and to resync everything, requiring a kong restart. Notes
History
Logs (using debug level and --v=3) Steps To Reproduce Lot of ingresses configured (13), deploy kong 2.1, then upgrade to 2.2. We created a bash script with a for loop performing rollout restarts, for 3 services, 1 operation at a time, waiting to complete before proceed to next. |
Interesting. While I can see that we didn't remove those targets during the 2.2 upgrade, I'm still unsure exactly when these 404s happen. It doesn't appear that simply having zero-weight target(s) causes this and that there's some other condition necessary to generate 404s. The workaround to delete the offending targets is reasonable, as they're not bound to anything else in the database (only to runtime health status data), so, in Postgres:
will clear them. To try and artificially replicate this, I tried:
There is some legacy logic to update targets via POST that I thought might be the culprit, but I'm unsure when that's actually used or whether/why it would 404: my attempt to use the same IP was what I thought would use that, but as it didn't trigger a 404, there's still something missing if that's related. @cpcostaneves what was the replica count of your largest affected upstream Deployment, and how many times did you restart it on 2.1, if you recall? My choice of 200 was arbitrary; this may require a larger number of zero-weight targets if their number is related. |
@rainest ingress controller will not be aware of the zero weight targets, since tag is also null, and will use a different id when try to post (this is actually the real issue). so after adding a zero weight target, try to add again with same IP:port but change the id You can try the following (replace upstream id by one you have, and replace kong admin api)
Note that I've changed the target id (last digit). The return I get in kong 2.2 for the second commands is a 404 with One suggestion for ingress controller is to handle all stored targets, not filtering by tag, since it is the owner of the upstream and should handle all targets for that target. So it can delete offending or no sense targets, before posting the new ones. This will prevent issues in wild cases, like a real user adds manually a target id. The largest upstream I had was with 3 replicas and we re deployed around 200 times with kong 2.1 (good estimation :) ) |
Ah, okay, I think I see what I did now: the 409s are apparently a weird (and, for the controller, irrelevant) corner case that only appear if you attempt to provide a target without a port, which I'd done accidentally when trying to replicate this without the controller. I'd attempted to POST a target like:
Kong fills in a default port (8000) if none is provided, so that effectively tries to add I hadn't realized the controller insert logic included the target ID in the request body. That appears to be key to the 404 case, though I'm not exactly sure where we generate the ID after an initial review of the KIC and decK target code. Edit: the controller does not set any target IDs, but decK generates them randomly if it cannot find an existing target. It could theoretically get away from this, but we can't use the IP+port combo alone--it'd need to search by upstream ID also. Requests that do not provide an ID are accepted, and update the existing zero-weight target:
If you do provide a target ID, then you get a 404 as shown in the previous comment. Intuitively, it seems like the legacy logic is searching for that IP+port combination, finding the existing zero-weight target, but then (if you provided an ID), falls back on some of the "zero-weight targets are invisible" logic. Yeesh. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I am having similar issuees. |
My dev team has written and java-based microservice, which does return 200 and 40X based on certain headers values. When this kongplugin is used in another microservice, it does not report anything and after some time it just gives me 504 -Bad Gateway response. ===== sureshadapa@localhost kongplugins % kubectl version helm.sh/chart: kong-3.5.0 I shall also try deleting the records with 0 |
The weight is 100 for all rows in my "targets" table? I am blocked and unable to proceed further. |
Same problem. The ingress controller did't update the upstream after the deployment rollout. |
Summary
SUMMARY_GOES_HERE
Kong Ingress controller version
v1.1.0
Kong or Kong Enterprise version
unknown
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.9-aliyun.1", GitCommit:"4f7ea78", GitTreeState:"", BuildDate:"2020-05-08T07:29:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
paste
kubectl version
outputEnvironment
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
uname -a
):What happened
After update the deployment, old pod deleted, new pod created, but targets of the upsteam is missing. I found controller logs:
Then I delete pods of deployment(cli-im-net), then new pod created and new target appears.
I also found some strange logs from kong, like
maybe it helps.
Expected behavior
expect the consistence between K8s pod OPs(like create/delete) and kong targets
Steps To Reproduce
randomly, some update op can reproduce.
The text was updated successfully, but these errors were encountered: