Skip to content

Commit

Permalink
Move back to FailureThreshold failures of /gshealthz (#3160)
Browse files Browse the repository at this point in the history
This again reverts part of #3072. In looking at recent failures, in
cases where the sidecar is slow to come up due to networking issues,
/gshealthz fails, resulting often in the unnecessary restart of the
game server.

This returns to the more generous failure threshold from prior to
to just make this ~infinite, i.e. neuter kubelet liveness, since
health is really owned by the controller and sidecar.
  • Loading branch information
zmerlynn authored May 17, 2023
1 parent 2dc2525 commit 0137747
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 7 deletions.
14 changes: 9 additions & 5 deletions pkg/gameservers/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -706,15 +706,19 @@ func (c *Controller) addGameServerHealthCheck(gs *agonesv1.GameServer, pod *core
},
},
// The sidecar relies on kubelet to delay by InitialDelaySeconds after the
// container is started (after image pull, etc), and relies on the kubelet
// for PeriodSeconds heartbeats to evaluate health.
// container is started (after image pull, etc).
InitialDelaySeconds: gs.Spec.Health.InitialDelaySeconds,
PeriodSeconds: gs.Spec.Health.PeriodSeconds,

// By the time /gshealthz returns unhealthy, the sidecar has already evaluated
// FailureThreshold in a row failed health checks, so on the kubelet side, one
// failure is sufficient to know the game server is unhealthy.
FailureThreshold: 1,
// {FailureThreshold in a row} failed health checks, so in theory on the kubelet
// side, one failure is sufficient to know the game server is unhealthy. However,
// with only one failure, if the sidecar doesn't come up at all, we unnecessarily
// restart the game server. So use FailureThreshold as startup wiggle-room as well.
//
// Note that in general, FailureThreshold could also be infinite - the controller
// and sidecar are responsible for health management.
FailureThreshold: gs.Spec.Health.FailureThreshold,
}
}

Expand Down
4 changes: 2 additions & 2 deletions pkg/gameservers/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1056,7 +1056,7 @@ func TestControllerCreateGameServerPod(t *testing.T) {
assert.Equal(t, intstr.FromInt(8080), gsContainer.LivenessProbe.HTTPGet.Port)
assert.Equal(t, fixture.Spec.Health.InitialDelaySeconds, gsContainer.LivenessProbe.InitialDelaySeconds)
assert.Equal(t, fixture.Spec.Health.PeriodSeconds, gsContainer.LivenessProbe.PeriodSeconds)
assert.Equal(t, int32(1), gsContainer.LivenessProbe.FailureThreshold)
assert.Equal(t, fixture.Spec.Health.FailureThreshold, gsContainer.LivenessProbe.FailureThreshold)
assert.Len(t, gsContainer.VolumeMounts, 1)
assert.Equal(t, "/var/run/secrets/kubernetes.io/serviceaccount", gsContainer.VolumeMounts[0].MountPath)

Expand Down Expand Up @@ -1758,7 +1758,7 @@ func TestControllerAddGameServerHealthCheck(t *testing.T) {
require.NotNil(t, probe)
assert.Equal(t, "/gshealthz", probe.HTTPGet.Path)
assert.Equal(t, intstr.IntOrString{IntVal: 8080}, probe.HTTPGet.Port)
assert.Equal(t, int32(1), probe.FailureThreshold)
assert.Equal(t, fixture.Spec.Health.FailureThreshold, probe.FailureThreshold)
assert.Equal(t, fixture.Spec.Health.InitialDelaySeconds, probe.InitialDelaySeconds)
assert.Equal(t, fixture.Spec.Health.PeriodSeconds, probe.PeriodSeconds)
}
Expand Down

0 comments on commit 0137747

Please sign in to comment.