Skip to content

Commit

Permalink
Changes for matching updated template
Browse files Browse the repository at this point in the history
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
  • Loading branch information
slintes committed Jul 8, 2021
1 parent 391baa1 commit 992de2d
Showing 1 changed file with 12 additions and 5 deletions.
17 changes: 12 additions & 5 deletions enhancements/machine-api/machine-health-checking.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,12 @@ For a node notFound or a failed machine, the machine is considerable unrecoverab
- The machine controller provider implementation deletes the cloud instance.
- The machine controller deletes the machine resource.

### User Stories

- I want my worker machines to be remediated when the backed node has `ready=false` or `ready=Unknown` condition for more than 10m.
- I want remediation to temporary short-circuit if the 40% or more of the targets of this pool are unhealthy at the same time.
- I want no remediation to happen while my cluster is upgrading its machines / nodes.

### Implementation Details

#### MachineHealthCheck CRD
Expand All @@ -104,11 +110,6 @@ For a node notFound or a failed machine, the machine is considerable unrecoverab
- Enable setting a threshold of unhealthy nodes. If the current number is at or above this threshold no further remediation will take place. This can be expressed as an int or as a percentage of the total targets in the pool.
- Enable pausing of remediation

E.g:
- I want my worker machines to be remediated when the backed node has `ready=false` or `ready=Unknown` condition for more than 10m.
- I want remediation to temporary short-circuit if the 40% or more of the targets of this pool are unhealthy at the same time.
- I want no remediation to happen while my cluster is upgrading its machines / nodes.

```yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
Expand Down Expand Up @@ -190,6 +191,12 @@ This feature will be tested for public clouds in the e2e machine API suite as th
### Graduation Criteria
An implementation of this feature is currently gated behind the `TechPreviewNoUpgrade` flag. This proposal wants to remove the gating flag and promote machine health check to a GA status with a beta API.

#### Dev Preview -> Tech Preview

#### Tech Preview -> GA

#### Removing a deprecated feature

### Upgrade / Downgrade Strategy

The machine health check controller lives in the machine-api-operator image so the upgrades will be driven by the CVO which will fetch the right image version as usual. See:
Expand Down

0 comments on commit 992de2d

Please sign in to comment.