Skip to content

Commit

Permalink
KEP-1972: kubelet exec probe timeouts
Browse files Browse the repository at this point in the history
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
  • Loading branch information
andrewsykim committed Sep 8, 2020
1 parent 1e80263 commit 95a36f6
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 0 deletions.
108 changes: 108 additions & 0 deletions keps/sig-node/1972-kubelet-exec-probe-timeouts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# KEP-1972: Kubelet Exec Probe Timeouts

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [X] (R) KEP approvers have approved the KEP status as `implementable`
- [X] (R) Design details are appropriately documented
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [X] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

Kubelet today does not respect exec probe timeouts. This is considered a bug we should fix since
the timeout value is supported in the Container Probe API. Because exec probe timeouts
were never respected by kubelet, a new feature gate `ExecProbeTimeouts` will be introduced
so users have an easy way to revert back if the newly introduced probe timeout results
in unexpected behavior.

## Motivation

Kubelet not respecting the probe timeout is a bug and should be fixed.

### Goals

* fix exec probe timeouts in kubelet

### Non-Goals

* ensuring exec processes that timed out have been killed by kubelet.

## Proposal

### Risks and Mitigations

* existing workloads on Kubernetes that relied on this bug may unexpectedly see their probes timeout

## Design Details

Changes to kubelet:
* Ensure kubelet handles timeout errors and registers them as failing probes.
* Add feature gate `ExecProbeTimeouts` that is GA and on by default.
* If the feature gate `ExecProbeTimeouts` is disabled and an exec probe timeout is reached, add warning logs to inform users that exec probes are timing out.
* Re-enable existing exec liveness probe e2e test.
* Add new exec readiness probe e2e test.

### Test Plan

E2E tests:
* re-enable [existing exec liveness probe e2e test](https://github.com/kubernetes/kubernetes/blob/ea1458550077bdf3b26ac34551a3591d280fe1f5/test/e2e/common/container_probe.go#L210-L227) that is currently being skipped
* add new exec readiness probe e2e test.

### Graduation Criteria

This is a bug fix so the feature gate will be GA and on by default from the start.

### Upgrade / Downgrade Strategy

N/A

### Version Skew Strategy

N/A

## Implementation History

* 2020-09-08 - the KEP was merged as implementable for v1.20

## Drawbacks

* Existing workloads may depend on the fact that exec probe timeouts were never respected. Introducing
the timeout now may result in unexpected behavior for some workloads.

## Alternatives

Some alternatives that were considered:
1. Increasing the default timeout for exec probes
2. Continuing to ignore the exec probe timeout

36 changes: 36 additions & 0 deletions keps/sig-node/1972-kubelet-exec-probe-timeouts/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
title: Kubelet Exec Probe Timeouts
kep-number: 1972
authors:
- "@andrewsykim"
- "@SergeyKanzhelev"
owning-sig: sig-node
participating-sigs:
status: implementable
creation-date: 2020-09-08
reviewers:
- "@dchen1107"
- "@derekwaynecarr"
approvers:
- "@dchen1107"
- "@derekwaynecarr"

# The target maturity stage in the current dev cycle for this KEP.
stage: stable

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.20"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
stable: "v1.20"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: ExecProbeTimeouts
components:
- kubelet
disable-supported: true

0 comments on commit 95a36f6

Please sign in to comment.