-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug with hung GameServer resource on Kubernetes 1.10 #278
Fix bug with hung GameServer resource on Kubernetes 1.10 #278
Conversation
It looks like the `foregroundDeletion` finalizer that gets added to the GameServer when a deletion with foreground propogation is requested - when it gets removed, by the Kubernetes system, the ResourceVersion/Generation doesn't get incremented. This means when we do an `Update` it will go through (usually it fails if the ResourceVersion/Generation has changed since the last update), rather than failing and going back to re-sync. This can cause a race condition where the `Update` can basically put back the `foregroundDeletion` finalizer, if it gets removed between retrieving the `GameServer` object and doing the `Update()` operation. We actually don't need the `foregroundDeletion` propogation on `Delete()`, as we only want to keep the `GameServer` around until the backing `Pod` is gone, so we can actually just turn this off instead in the code. As a note - also explored using a `Patch()` operation to try and solve this problem. AFAIK, a patch (and specifically a JSONPatch - other patch types didn't seem better), can only do index based removal from a list of items. This could just present other types of race conditions, as finalizers could be added/removed by other systems at the same time as well, changing the index to the finalizer we want at the same time. Not sure if this is a bug in the Kubernetes system, or working as intended, but this solves the issue.
Build Succeeded 👏 Build Id: bef30346-09c3-4006-bf6c-d848c07ce943 The following development artifacts have been built, and will exist for the next 30 days:
|
Update codegen library to 1.10.5, and regen the CRD clients. This may be the cause of googleforgames#278 (but that change needed to happen anyway)
Update codegen library to 1.10.5, and regen the CRD clients. This may be the cause of #278 (but that change needed to happen anyway)
I'm reading this https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/, then i'll look at your code |
So if I understand well we don't rely on k8s deleting dependent resources, because we're deleting the pods ourselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the approve. I'm also running into other people who look to have potentially run into the same bug with Foreground deletion and finalisers, so I'm actually genuinely wondering if it's a real bug in K8s -- either way, we don't need the foreground deletion (since we handle the actual lifecycle of the Hope that makes sense. |
Build Succeeded 👏 Build Id: 8876f7dd-3055-4f9f-98f4-8ec42d1afb53 The following development artifacts have been built, and will exist for the next 30 days:
|
It looks like the
foregroundDeletion
finalizer that gets added to the GameServer when a deletion with foreground propogation is requested - when it gets removed, by the Kubernetes system, the ResourceVersion/Generationdoesn't get incremented.
This means when we do an
Update
it will go through (usually it fails if the ResourceVersion/Generation has changed since the last update), rather than failing and going back to re-sync.This can cause a race condition where the
Update
can basically put back theforegroundDeletion
finalizer, if it gets removed between retrieving theGameServer
object and doing theUpdate()
operation.We actually don't need the
foregroundDeletion
propogation onDelete()
, as we only want to keep theGameServer
around until the backingPod
is gone, so we can actually just turn this off instead in the code.As a note - also explored using a
Patch()
operation to try and solve this problem. AFAIK, a patch (and specifically a JSONPatch - other patch types didn't seem better), can only do index based removal from a list of items.This could just present other types of race conditions, as finalizers could be added/removed by other systems at the same time as well, changing the index to the finalizer we want at the same time.
Not sure if this is a bug in the Kubernetes system, or working as intended, but this solves the issue.