Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External control plane topology #744

Merged
merged 1 commit into from
May 18, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions enhancements/external-control-plane-topology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: External Control Plane Topology
authors:
- "@csrwng"
reviewers:
- "@derekwaynecarr"
- "@ironcladlou"
- "@enxebre"
- "@sjenning"
csrwng marked this conversation as resolved.
Show resolved Hide resolved
- "@sttts"
- "@deads2k"
- "@mfojtik"
- "@s-urbaniak"
- "@spadgett"
- "@dmage"
- "@Miciah"
approvers:
- "@derekwaynecarr"
- "@smarterclayton"
creation-date: 2021-04-19
last-updated: 2021-04-19
status: implementable
see-also:
- "/enhancements/update/ibm-public-cloud-support.md"
- "/enhancements/single-node/cluster-high-availability-mode-api.md"
---

# External Control Plane Topology

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

External control plane support was introduced in OCP 4 with the support
for the [IBM Cloud Managed service](https://github.com/openshift/enhancements/blob/master/enhancements/update/ibm-public-cloud-support.md) (ROKS). At the time, this was the only platform that
was run with an external control plane. It was sufficient to use a platform type of
`IBMCloud` to distinguish it from other OCP installations with traditional
control plane topology.

More recently, an orthogonal field was [added to the Infrastructure resource](https://github.com/openshift/enhancements/blob/master/enhancements/single-node/cluster-high-availability-mode-api.md) to indicate
the type of control plane topology. Current supported values for the `controlPlaneTopology` field
are `HighlyAvailable` and `SingleReplica`.

This enhancement proposes adding a third possible value to the control plane topology
field: `External`. A value of `External` in this field indicates that control plane components
such as Etcd, Kube API server, Kube Controller Manager, and Kube Scheduler are running outside
the cluster and are not visible as pods inside the cluster.

## Motivation

Whether the control plane is external or not should not be tied to the platform that the cluster
is running on. IBM Cloud will soon support IPI/UPI installation. Thus, having a platform of `IBMCloud`
will not imply an external control plane. Hypershift will bring support for external control planes
to existing platforms such as AWS.

### Goals

- Allow expressing a new type of control plane topology in the Infrastructure resource inside an
OCP cluster.

- Provide operators/components that change their behavior based on whether the control plane is external or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the list of components that need to be changed captured somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's components that have been modified to work with the IBMCloud platform type, will add a list with corresponding PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp the list of components are useful, but really this is more saying 'there will never be a role=master' host in this cluster where set to External

self hosted, a clear indicator of what mode they're running in.

### Non-Goals

- Provide a design for running OCP with an external control plane.

- Describe how the `controlPlaneTopology` field will be set for an external control plane deployment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement is talking about the value it will be set to. Are you saying that we will not specify the actor which sets the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't think it was relevant to explain that Hypershift or IBM ROKS toolkit do their own bootstrapping and will set the value appropriately. But if you think it's relevant I can add it to the doc.

Copy link
Member

@wking wking Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving it out. But maybe rephrase this line to:

  • Describe the mechanics of populating the controlPlaneTopology field. This enhancement only introduces the new value.

or some such to make it more clear what portion is out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking for clarification, the externalized control plane manager is responsible for projecting this value into the end-user cluster data plane.


## Proposal

- Add `External` as an additional option to the `TopologyMode` type for `Infrastructure`

### User Stories

As a platform provider I can set a control plane topology of External to signal OCP components to adjust their
behavior accordingly.


### Implementation Details

Currently `External` only makes sense as a mode for the ControlPlaneTopology, not the InfrastructureTopology, but
csrwng marked this conversation as resolved.
Show resolved Hide resolved
both use the same type (`TopologyMode`).

The Hypershift or IBM ROKS installer is responsible for setting this value when bootstrapping a new hosted control
plane OpenShift cluster.

#### Component Impact

Existing components such as Console and Monitoring that use platform type of `IBMCloud` to modify their behavior
for external control planes would need to switch to `controlPlaneTopology` to determine whether the control plane
is external.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test plan for this "CI jobs with an external control plane pass e2e without blowing up"? Do we have CI coverage for things like whatever console twiddles based on this (looks like maybe just dashboard availability)? Or is the expectation that we make a best-effort attempt to transition existing IBMCloud consumers and then fix any we miss (or which slip in later) as we stumble across them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't think it was relevant to explain that Hypershift or IBM ROKS toolkit do their own bootstrapping and will set the value appropriately. But if you think it's relevant I can add it to the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry answer to the wrong question :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test plan for IBM Cloud is that QE on their side will verify that we have not regressed in the tweaks that have been made based on the IBMCloud platform. On our side, we don't yet have automated CI for Hypershift setups, but it's something that we're working towards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Risks and Mitigations


## Design Details

### Test Plan

IBM ROKS will need to be regression tested by IBM Cloud QE. There is no impact to mainline OCP.

### Graduation Criteria
#### Dev Preview -> Tech Preview
#### Tech Preview -> GA
#### Removing a deprecated feature
### Upgrade / Downgrade Strategy
### Version Skew Strategy

## Implementation History

## Drawbacks

None

## Alternatives

None