-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition(?) when running clustered in Kubernetes #2125
Comments
@DonMartin76 Can you check what Also please try explicitly setting
|
Im experiencing a similar issue. Launching multiple replicas of kong. Sometimes works others dont. Seems a race condition possibly related to migration being run by every pod. Configuration for cluster listening is The identifiable pattern that i find is that when it works and clustering finds everyone is because 1 of the pods ran the migrations completely and the rest just go as usual startup. Any thoughts? |
@edwardjrp Migrations are not supposed to be run concurrently from multiple Kong instances. They should be run from a single node, eventually from the |
Thank you @thibaultcha , i figured something was off due to migrations. Any pattern for this worth sharing? thanks again. |
I hit something similar while trying an upgrade from 0.9.9 to 0.10 RC4. In my case I only have one replica but the way a deploy works in k8s means there can be two for a short period. For some reason the new instance did not respond to any admin or proxy requests until I added I suppose how this is handled mainly depends on whether the DB migrations are always backwards compatible for at least one release. i.e. whether they can be rolled while some connected Kong servers are still up. If they are not, the only way to handle this seems to be an entire new deploy+db with a data export. If they are, is there a way to prevent |
Actually I've just found not having |
I managed a workaround for migrations race conditions on kubernetes. Essentially by running migrations from a k8s job and forcing pods to wait until the job finishes by implementing init container logic on the kong pods definition. Hope this approach may be useful for someone else out There. |
@edwardjrp Mind to share your configuration? My problem may also be related to migrations. |
@edwardjrp regarding Kong clustering issue, please try setting |
@edwardjrp if you wish to share your method to avoid this race in k8s, I'd love to work together to get it added into the Helm Chart I've submitted a PR for. |
@c-knowles the minute i get a chance ill submit my solution, thou not the cleanest since it uses internal k8s api to check job status and force using init containers to wait until migration runs. |
@edwardjrp great, no rush at all. We can work to tidy it up in the incubator perhaps. |
@shashiranjan84 @thibaultcha we are also facing split brain issue with kong inside docker swarm. We are running Nodes table has following entries
Cluster API returns following output
There are no errors in serf.log |
@endeepak i encorage you to move to kong 0.11.0. It has pretty cool new things, bug fixed and fine enhancements including better clustering without using serf. |
I would strongly advise following @edwardjrp's advice as well... The clustering support in 0.10 was indeed subject to race conditions. Our recommended approach to a better clustering experience is an upgrade to Kong 0.11. Kong 0.11 also strongly enforces a manual (or at least, isolated) migration process, which is much safer than Kong 0.10's "automatic" migration behavior with I will be closing this now, sorry for the delay on our side! |
Summary
At times, when running several (3+) Kong instances at once in a clustered environment inside Kubernetes, and I (for some reason or the other, like draining nodes) kill all associated containers/pods, it sometimes happens that I get into a "split brain" kind of cluster state: One Kong instance thinks it's alone in a cluster, while the other two think they make up a cluster of two (when checking inside the containers using the
kong
CLI).It's not always reproducible, but once in a while this happens, and the effect is obviously that some changes done on the admin API do not propagate to the other "side", and that calls to the configured APIs are met with a 503 ("No API defined for...") instead of the real response.
The workaround for this is to not let Kubernetes start more than one Kong instance at once; if I scale up "slowly" (like every 5 seconds or so), this problem does not occur.
If I don't "restart" the entire Kong cluster at once, this does not occur, it seems to be some kind of race condition if two Kong instances both claim to be the "first" instance in the cluster. In cases this does not happen, everything works perfectly.
Is there something I might be doing wrong, something I can check, and/or do to make this not happen?
Additional Details & Logs
The text was updated successfully, but these errors were encountered: