From 6deedef05fcf6b7de6cc5aee1c1fa103b84ae5a1 Mon Sep 17 00:00:00 2001 From: Elana Hashman Date: Mon, 20 Sep 2021 17:02:14 -0700 Subject: [PATCH] Add initial OpenShift swap enhancement --- enhancements/kubelet/node-swap.md | 314 ++++++++++++++++++++++++++++++ 1 file changed, 314 insertions(+) create mode 100644 enhancements/kubelet/node-swap.md diff --git a/enhancements/kubelet/node-swap.md b/enhancements/kubelet/node-swap.md new file mode 100644 index 00000000000..e1add6ef9ae --- /dev/null +++ b/enhancements/kubelet/node-swap.md @@ -0,0 +1,314 @@ +--- +title: node-swap +authors: + - "@ehashman" +reviewers: + - "@rphilips" + - "@sjenning" + - "???" +approvers: + - "@mrunalp" +creation-date: "2021-06-23" +status: implementable +--- + +# OpenShift Node Swap Support + +## Release Signoff Checklist + +- [X] Enhancement is `implementable` +- [X] Design details are appropriately documented from clear requirements +- [X] Test plan is defined +- [X] Operational readiness criteria is defined +- [ ] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Summary + +The upstream Kubernetes 1.22 release introduced alpha support for configuring +swap memory usage for Kubernetes workloads on a per-node basis. + +Now that swap use on nodes is supported in upstream, there are a number of use +cases that would benefit from OpenShift nodes supporting swap, including +improved node stability, better support for applications with high memory +overhead but smaller working sets, the use of memory-constrained devices, and +memory flexibility. + +## Motivation + +See [KEP-2400: Motivation]. + +[KEP-2400: Motivation]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#motivation + +### Goals + +- Swap can be provisioned and configured for nodes to use in an OpenShift + cluster. + +### Non-Goals + +- Workload-specific swap accounting. +- Any of the non-goals in [KEP-2400: Non-goals]. + +[KEP-2400: Non-goals]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#non-goals + +## Proposal + +### User Stories + +See [KEP-2400: User Stories](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories). + +### API Extensions + +No new API extensions are required. For tech preview, the only update required +is the addition of a `NodeSwapNoUpgrade` feature set to the OpenShift API. See +[Design Details](#design-details) for the specifics. + +### Implementation Details/Notes/Constraints [optional] + +See [KEP-2400: Notes/Constraints/Caveats](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#notesconstraintscaveats-optional). + +### Risks and Mitigations + +See [KEP-2400: Risks and Mitigations]. + +[KEP-2400: Risks and Mitigations]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#risks-and-mitigations + +## Design Details + +### Enable `NodeSwap` feature gate + +We will add a new FeatureSet `NodeSwapNoUpgrade` with feature gate `NodeSwap` +enabled for [Tech Preview]. + +Note that we will not add this to the existing `TechPreviewNoUpgrade` feature +set list, as we do not wish to enable this feature simultaneously with other +tech preview features. Note that swap cannot be enabled or used without +additional configuration, so there isn't necessarily a risk of adding it to the +existing Tech Preview feature set; rather, for testing swap, we do not want to +require users to also turn on other features. + +Without this change, it is possible to enable the feature gate on any 4.9+ +cluster by defining `NodeSwap` in a `customNoUpgrade` configuration: + +```yaml +apiVersion: config.openshift.io/v1 +kind: FeatureGate +metadata: + name: cluster +spec: + featureSet: CustomNoUpgrade + customNoUpgrade: + enabled: + - NodeSwap +``` + +Once the `NodeSwap` feature flag is enabled, a cluster admin can enable swap +usage on the cluster as follows: + +### Ensure component versions support swap + +While swap support has been available in Kubernetes since the 1.22.0 release, +the container runtimes also need support in order for Kubernetes workloads to +be able to use the feature correctly. + +Therefore, a user of OpenShift version >4.9.0 can enable the feature flag, but +they also will require a version of CRI-O that supports the +`MemorySwapLimitInBytes` for best results. This is only supported in the 1.23 +and onwards releases of CRI-O, and should be supported in the first release of +OpenShift 4.10. + +### Configure worker Kubelets + +Swap behaviour on a node can be configured with +[`memorySwap.SwapBehavior`](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory). + +The most straightforward way to configure the kubelets with this feature is +with a custom `KubeletConfig` that will automatically be applied by the MCO: + +```bash +# Enable the custom kubelet config on the worker pool +oc label machineconfigpool worker custom-kubelet=enabled +``` + +```yaml +apiVersion: machineconfiguration.openshift.io/v1 +kind: KubeletConfig +metadata: + name: custom-config +spec: + machineConfigPoolSelector: + matchLabels: + custom-kubelet: enabled + kubeletConfig: + failSwapOn: false + memorySwap: + swapBehavior: UnlimitedSwap # LimitedSwap is also supported +``` + +Note that enabling swap on control plane nodes is possible but **not** recommended. + +### Add swap to nodes + +There are a few different ways this can be accomplished. The most +straightforward is to add a kubelet configuration that enables a swapfile at +startup with a custom machine config, the same way we enable swap in upstream +CI: + +```yaml +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + name: 90-worker-swap +spec: + config: + ignition: + version: 3.2.0 + systemd: + units: + - contents: | + [Unit] + Description=Enable swap on CoreOS + Before=crio-install.service + ConditionFirstBoot=no + + [Service] + Type=oneshot + ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/swapfile count=1024 bs=1MiB && sudo chmod 600 /var/swapfile && sudo mkswap /var/swapfile && sudo swapon /var/swapfile && free -h" + [Install] + + WantedBy=multi-user.target + enabled: true + name: swap-enable.service +``` + +It is also possible to use [ignition +configs](https://coreos.github.io/ignition/configuration-v3_3/) to add swap +partitions to worker nodes with `filesystems.format = swap`. + +Beyond tech preview, we may want to look into adding swap support to the +installer, and consider adding a partition on the root node, or perhaps +[provision and mount an NVMe +volume](https://github.com/openshift/machine-config-operator/issues/1619). + +### Open Questions [optional] + +- Will we eventually want to enable swap on all OpenShift nodes by default? +- Should swap just be limited to worker nodes, or should we consider adding it + to control plane nodes too? + +Testing in reliability clusters and in various configurations will help us +answer these questions. Stability metrics determined through the upstream beta +will assist. + +### Test Plan + +In addition to the upstream e2e tests, we will need to add e2e suites to +OpenShift in order to exercise provisioning and use of swap. This may include +unit tests where appropriate, such as the MCO. + +### Graduation Criteria + +#### Dev Preview -> Tech Preview + +Requires alpha support in upstream Kubernetes (1.22+) and support in CRI-O +(1.23+). + +- Support provisioning OpenShift nodes with swap enabled for all available + upstream swap configurations (currently `LimitedSwap`, `UnlimitedSwap`). + +JIRA: https://issues.redhat.com/browse/OCPNODE-470 + +_Graduation criteria below are tentative._ + +#### Tech Preview -> GA + +Requires beta/GA support in upstream Kubernetes. (1.25?+) + +- More testing (upgrade, downgrade, scale) +- Sufficient time for feedback +- Available by default +- Backhaul SLI telemetry +- Document SLOs for the component +- Conduct load testing + +#### Removing a deprecated feature + +- Announce deprecation and support policy of the existing feature +- Deprecate the feature + +Deprecation of this feature would likely follow upstream deprecation, which is +not currently planned. + +### Upgrade / Downgrade Strategy + +The `NodeSwap` feature flag is not supported in Kubernetes versions prior to +1.22/OpenShift 4.9. We will add the upstream `NodeSwap` feature flag as a +"NoUpgrade" feature flag to prevent upgrades. + +Note that swap support does not require coordination between components and the +configuration is limited to individual nodes/machine pools. + +See also [KEP-2400: Upgrade/Downgrade Strategy]. + +[Tech Preview]: https://github.com/openshift/enhancements/blob/ce4d303db807622687159eb9d3248285a003fabb/guidelines/techpreview.md#official-processmechanism-for-delivering-a-tp-feature +[KEP-2400: Upgrade/Downgrade Strategy]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#upgrade--downgrade-strategy + +### Version Skew Strategy + +N/A, this is a compatible API change limited to the Kubelet that does not +require coordination with the API Server. + +### Operational Aspects of API Extensions + +None. The only API extension is a feature gate change. There are no plans to +enable the feature by default until extensive testing takes place and upstream +announces beta or GA support for the feature. + +For potential operational impacts of enabling swap, see [KEP-2400: Risks and +Mitigations]. + +#### Failure Modes + +See [KEP-2400: Rollout, Upgrade and Rollback Planning] and [KEP-2400: +Troubleshooting]. + +[KEP-2400: Rollout, Upgrade and Rollback Planning]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#rollout-upgrade-and-rollback-planning +[KEP-2400: Troubleshooting]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#troubleshooting + +#### Support Procedures + +See [KEP-2400: Production Readiness Review Questionnaire] and [KEP-2400: +Troubleshooting]. + +[KEP-2400: Production Readiness Review Questionnaire]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#production-readiness-review-questionnaire + +## Implementation History + +- [Upstream alpha swap support] completed in Kubernetes 1.22. +- [CRI-O support] added in 1.23.0. + +[Upstream alpha swap support]: https://github.com/kubernetes/enhancements/issues/2400#issuecomment-884327938 +[CRI-O support]: https://github.com/cri-o/cri-o/pull/5207 + +## Drawbacks + +See [KEP-2400: Drawbacks]. + +[KEP-2400: Drawbacks]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#drawbacks + +## Alternatives + +See [KEP-2400: Alternatives]. + +[KEP-2400: Alternatives]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#alternatives + +## Infrastructure Needed [optional] + +- We will need to configure periodic e2e tests on VMs with swap enabled. +- We will need to enable swap on a [reliability cluster] to gauge long-term + stability. + +[reliability cluster]: https://issues.redhat.com/browse/OCPNODE-619