-
Notifications
You must be signed in to change notification settings - Fork 463
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Began working on proposal for IPI support for PowerVS.
PowerVS is a cloud platform backed by IBM power hardware. You can find out more at https://www.ibm.com/cloud/power-virtual-server.
- Loading branch information
Showing
1 changed file
with
237 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,237 @@ | ||
--- | ||
title: powervs-ipi | ||
authors: | ||
- "@jaypoulz" | ||
reviewers: | ||
- "@crawford" | ||
- "@andymcc" | ||
- "@Prashanth684" | ||
approvers: | ||
- TBD | ||
creation-date: 2021-04-16 | ||
last-updated: 2021-04-16 | ||
status: implementable | ||
--- | ||
|
||
# powervs-platform-provider | ||
|
||
## Release Signoff Checklist | ||
|
||
- [x] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
This document describes how [PowerVS][powervs-website] becomes an infra provider for OpenShift. PowerVS is a cloud offering by IBM that aims to provid customers interested in running secure and flexible workloads on IBM Power (ppc64le) hardware without having to develop and maintain an on premise datacenter. The goal of this enhancement would be to leverage the PowerVS APIs to deploy an OpenShift cluster on this cloud environment. In addition, the APIs would be leveraged for cluster resizing operations post installation. | ||
|
||
## Motivation | ||
|
||
- Current deployments of OpenShift on Power use UPI, with the exception of the development only configuration of IPI on libvirt. | ||
- A non-x86 IPI installer would serve as a first step towards improving the ease of deployments on non-x86 hadware. | ||
- Currently, the Multi-Arch team is stuck using a development configuration (IPI on libvirt) to cover our CI workloads for OpenShift on ppc64le verification. This would help diversify our CI effort. | ||
- Power hardware is especially targeted to AI and secure workloads, so building out cloud capabilities will help expand our workload footprint. | ||
|
||
### Goals | ||
|
||
- Provide a way to install OpenShift on PowerVS infrastructure using the OpenShift installer with IPI. | ||
- Implement the cluster-api provider to provide scaling and managing the cluster nodes post install. | ||
|
||
### Non-Goals | ||
|
||
- UPI support. We've already documented how to install on pp64le hardware using UPI. | ||
|
||
## Proposal | ||
|
||
This provider enables the OpenShift Installer to provision VM resources on the PowerVS cloud, that will be used as worker and masters of the cluster.Unlike some of the existing providers, this depends on image availability of CoreOS in the PowerVS cloud the OVA format. Initial investigation indicates that this can be acheived similiar to how we push AMIs to the Amazon cloud. | ||
|
||
Where things get interesting is when we start looking at networking. PowerVS doesn't currently have an equivalent of API configurable public IPs that we can leverage for setting up load balancing for ingress. We are currently evaluating our options for achieving similiar behavior using static network configurations. **Pending review.** | ||
|
||
=== | ||
|
||
TODO, complete template beyond this point | ||
|
||
### User Stories | ||
|
||
Detail the things that people will be able to do if this is implemented. | ||
Include as much detail as possible so that people can understand the "how" of | ||
the system. The goal here is to make this feel real for users without getting | ||
bogged down. | ||
|
||
Include a story on how this proposal will be operationalized: lifecycled, monitored and remediated at scale. | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
What are the caveats to the implementation? What are some important details that | ||
didn't come across above. Go in to as much detail as necessary here. This might | ||
be a good place to talk about core concepts and how they relate. | ||
|
||
### Risks and Mitigations | ||
|
||
What are the risks of this proposal and how do we mitigate. Think broadly. For | ||
example, consider both security and how this will impact the larger OKD | ||
ecosystem. | ||
|
||
How will security be reviewed and by whom? How will UX be reviewed and by whom? | ||
|
||
Consider including folks that also work outside your immediate sub-project. | ||
|
||
## Design Details | ||
|
||
### Open Questions [optional] | ||
|
||
This is where to call out areas of the design that require closure before deciding | ||
to implement the design. For instance, | ||
> 1. This requires exposing previously private resources which contain sensitive | ||
information. Can we do this? | ||
|
||
### Test Plan | ||
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Consider the following in developing a test plan for this enhancement: | ||
- Will there be e2e and integration tests, in addition to unit tests? | ||
- How will it be tested in isolation vs with other components? | ||
- What additional testing is necessary to support managed OpenShift service-based offerings? | ||
|
||
No need to outline all of the test cases, just the general strategy. Anything | ||
that would count as tricky in the implementation and anything particularly | ||
challenging to test should be called out. | ||
|
||
All code is expected to have adequate tests (eventually with coverage | ||
expectations). | ||
|
||
### Graduation Criteria | ||
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Define graduation milestones. | ||
|
||
These may be defined in terms of API maturity, or as something else. Initial proposal | ||
should keep this high-level with a focus on what signals will be looked at to | ||
determine graduation. | ||
|
||
Consider the following in developing the graduation criteria for this | ||
enhancement: | ||
|
||
- Maturity levels | ||
- [`alpha`, `beta`, `stable` in upstream Kubernetes][maturity-levels] | ||
- `Dev Preview`, `Tech Preview`, `GA` in OpenShift | ||
- [Deprecation policy][deprecation-policy] | ||
|
||
Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning), | ||
or by redefining what graduation means. | ||
|
||
In general, we try to use the same stages (alpha, beta, GA), regardless how the functionality is accessed. | ||
|
||
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions | ||
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ | ||
|
||
**Examples**: These are generalized examples to consider, in addition | ||
to the aforementioned [maturity levels][maturity-levels]. | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
- Ability to utilize the enhancement end to end | ||
- End user documentation, relative API stability | ||
- Sufficient test coverage | ||
- Gather feedback from users rather than just developers | ||
- Enumerate service level indicators (SLIs), expose SLIs as metrics | ||
- Write symptoms-based alerts for the component(s) | ||
|
||
#### Tech Preview -> GA | ||
|
||
- More testing (upgrade, downgrade, scale) | ||
- Sufficient time for feedback | ||
- Available by default | ||
- Backhaul SLI telemetry | ||
- Document SLOs for the component | ||
- Conduct load testing | ||
|
||
**For non-optional features moving to GA, the graduation criteria must include | ||
end to end tests.** | ||
|
||
#### Removing a deprecated feature | ||
|
||
- Announce deprecation and support policy of the existing feature | ||
- Deprecate the feature | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
If applicable, how will the component be upgraded and downgraded? Make sure this | ||
is in the test plan. | ||
|
||
Consider the following in developing an upgrade/downgrade strategy for this | ||
enhancement: | ||
- What changes (in invocations, configurations, API use, etc.) is an existing | ||
cluster required to make on upgrade in order to keep previous behavior? | ||
- What changes (in invocations, configurations, API use, etc.) is an existing | ||
cluster required to make on upgrade in order to make use of the enhancement? | ||
|
||
Upgrade expectations: | ||
- Each component should remain available for user requests and | ||
workloads during upgrades. Ensure the components leverage best practices in handling [voluntary disruption](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/). Any exception to this should be | ||
identified and discussed here. | ||
- Micro version upgrades - users should be able to skip forward versions within a | ||
minor release stream without being required to pass through intermediate | ||
versions - i.e. `x.y.N->x.y.N+2` should work without requiring `x.y.N->x.y.N+1` | ||
as an intermediate step. | ||
- Minor version upgrades - you only need to support `x.N->x.N+1` upgrade | ||
steps. So, for example, it is acceptable to require a user running 4.3 to | ||
upgrade to 4.5 with a `4.3->4.4` step followed by a `4.4->4.5` step. | ||
- While an upgrade is in progress, new component versions should | ||
continue to operate correctly in concert with older component | ||
versions (aka "version skew"). For example, if a node is down, and | ||
an operator is rolling out a daemonset, the old and new daemonset | ||
pods must continue to work correctly even while the cluster remains | ||
in this partially upgraded state for some time. | ||
|
||
Downgrade expectations: | ||
- If an `N->N+1` upgrade fails mid-way through, or if the `N+1` cluster is | ||
misbehaving, it should be possible for the user to rollback to `N`. It is | ||
acceptable to require some documented manual steps in order to fully restore | ||
the downgraded cluster to its previous state. Examples of acceptable steps | ||
include: | ||
- Deleting any CVO-managed resources added by the new version. The | ||
CVO does not currently delete resources that no longer exist in | ||
the target version. | ||
|
||
### Version Skew Strategy | ||
|
||
How will the component handle version skew with other components? | ||
What are the guarantees? Make sure this is in the test plan. | ||
|
||
Consider the following in developing a version skew strategy for this | ||
enhancement: | ||
- During an upgrade, we will always have skew among components, how will this impact your work? | ||
- Does this enhancement involve coordinating behavior in the control plane and | ||
in the kubelet? How does an n-2 kubelet without this feature available behave | ||
when this feature is used? | ||
- Will any other components on the node change? For example, changes to CSI, CRI | ||
or CNI may require updating that component before the kubelet. | ||
|
||
## Implementation History | ||
|
||
Major milestones in the life cycle of a proposal should be tracked in `Implementation | ||
History`. | ||
|
||
## Drawbacks | ||
|
||
The idea is to find the best form of an argument why this enhancement should _not_ be implemented. | ||
|
||
## Alternatives | ||
|
||
Similar to the `Drawbacks` section the `Alternatives` section is used to | ||
highlight and record other possible approaches to delivering the value proposed | ||
by an enhancement. | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
Use this section if you need things from the project. Examples include a new | ||
subproject, repos requested, github details, and/or testing infrastructure. | ||
|
||
Listing these here allows the community to get the process for these resources | ||
started right away. |