Skip to content

Commit

Permalink
Add more examples
Browse files Browse the repository at this point in the history
  • Loading branch information
sttts committed Oct 14, 2021
1 parent eda91b3 commit 8ab6b6d
Showing 1 changed file with 37 additions and 4 deletions.
41 changes: 37 additions & 4 deletions guidelines/enhancement_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,10 +280,10 @@ enhancement:

### Impact of API Extensions

Describe the API extensions here in details, especially their impact on a cluster:
Describe the API extensions here in detail, especially their impact on a cluster:

- what are the SLIs (Service Level Indicators) an operator can use to determine the health of
the API extensions
- what are the SLIs (Service Level Indicators) an administrator or support can use to
- determine the health of the API extensions

Examples: metrics, alerts, operator conditions
- which impact do these API extensions have on existing SLIs (e.g. scalability, API throughput,
Expand All @@ -310,12 +310,45 @@ Describe the API extensions here in details, especially their impact on a cluste
describe how to
- detect the failure modes in a support situation, describe possible symptoms (events, metrics,
alerts, which log output in which component)
- disable the API extension

Examples:
- if the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz".
- operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed"
- the metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")`
will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire.

- disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`)

- which consequences does it have on the cluster health?

Examples:
- garbage collection in kube-controller-manager will stop working.
- quota will be wrongly computed.

- which consequences does it have on existing, running workloads?

Examples:
- new namespaces won't get the finalizer "xyz" and hence might leak resource X
when deleted
- SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod
communication after some minutes.

- which consequences does it have for newly created workloads?

Examples:
- new pods in namespace with Istio support will not get sidecars injected, breaking
their networking

- does functionality fail gracefully and will resume work when re-enabled without risking
consistency?

Examples:
- the mutating admission webhook "xyz" has FailPolicy=Ignore and hence
will not block the creation or updates on objects when it fails. And when the
webhook comes back online, there is a controller reconciling all objects, applying
labels that we not applied during admission during downtime.
- namespaces deletion will not delete all objects in etcd, leading to zombie
objects when equally named namespace is created.

## Implementation History

Expand Down

0 comments on commit 8ab6b6d

Please sign in to comment.