Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External control plane topology #744

Merged
merged 1 commit into from
May 18, 2021

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Apr 19, 2021

Add support for indicating whether a control plane is external through the ControlPlaneTopology field

@csrwng
Copy link
Contributor Author

csrwng commented Apr 19, 2021

/cc @openshift/team-hypershift-maintainers @derekwaynecarr

@dhellmann
Copy link
Contributor

I think this is a 4.9 release priority, so I'm adding the label to make it show up that way in the weekly newsletter.

/priority important-soon

@openshift-ci-robot openshift-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 23, 2021

## Proposal

- Add `External` as an additional option to the `TopologyMode` type for `Infrastructure`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should replace the annotation, right?

  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eranco74 this is not going to take the place of cluster profiles. The cluster profile is for the CVO to decide which manifests to include/exclude or to set appropriate node selectors for operators. The controlPlaneTopology value is for components to adjust their behavior once they're running.

@csrwng
Copy link
Contributor Author

csrwng commented Apr 26, 2021

/cc @mfojtik @s-urbaniak @spadgett @dmage @Miciah
Please let me know if there's additional folks that should review this.

@csrwng
Copy link
Contributor Author

csrwng commented Apr 26, 2021

/retest

- Allow expressing a new type of control plane topology in the Infrastructure resource inside an
OCP cluster.

- Provide operators/components that change their behavior based on whether the control plane is external or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the list of components that need to be changed captured somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's components that have been modified to work with the IBMCloud platform type, will add a list with corresponding PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp the list of components are useful, but really this is more saying 'there will never be a role=master' host in this cluster where set to External


- Provide a design for running OCP with an external control plane.

- Describe how the `controlPlaneTopology` field will be set for an external control plane deployment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement is talking about the value it will be set to. Are you saying that we will not specify the actor which sets the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't think it was relevant to explain that Hypershift or IBM ROKS toolkit do their own bootstrapping and will set the value appropriately. But if you think it's relevant I can add it to the doc.

Copy link
Member

@wking wking Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving it out. But maybe rephrase this line to:

  • Describe the mechanics of populating the controlPlaneTopology field. This enhancement only introduces the new value.

or some such to make it more clear what portion is out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking for clarification, the externalized control plane manager is responsible for projecting this value into the end-user cluster data plane.

@dhellmann
Copy link
Contributor

There's a bug in the lint script that means the fact that this enhancement is missing a bunch of sections isn't caught by the automation. That bug is being fixed in #752.

The output of the script when I run it locally is:

enhancements/external-control-plane-topology.md missing "### User Stories"
enhancements/external-control-plane-topology.md missing "### Risks and Mitigations"
enhancements/external-control-plane-topology.md missing "## Design Details"
enhancements/external-control-plane-topology.md missing "### Test Plan"
enhancements/external-control-plane-topology.md missing "### Graduation Criteria"
enhancements/external-control-plane-topology.md missing "#### Dev Preview -> Tech Preview"
enhancements/external-control-plane-topology.md missing "#### Tech Preview -> GA"
enhancements/external-control-plane-topology.md missing "#### Removing a deprecated feature"
enhancements/external-control-plane-topology.md missing "### Upgrade / Downgrade Strategy"
enhancements/external-control-plane-topology.md missing "### Version Skew Strategy"
enhancements/external-control-plane-topology.md missing "## Implementation History"
enhancements/external-control-plane-topology.md missing "## Drawbacks"
enhancements/external-control-plane-topology.md missing "## Alternatives"


Existing components such as Console and Monitoring that use platform type of `IBMCloud` to modify their behavior
for external control planes would need to switch to `controlPlaneTopology` to determine whether the control plane
is external.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test plan for this "CI jobs with an external control plane pass e2e without blowing up"? Do we have CI coverage for things like whatever console twiddles based on this (looks like maybe just dashboard availability)? Or is the expectation that we make a best-effort attempt to transition existing IBMCloud consumers and then fix any we miss (or which slip in later) as we stumble across them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't think it was relevant to explain that Hypershift or IBM ROKS toolkit do their own bootstrapping and will set the value appropriately. But if you think it's relevant I can add it to the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry answer to the wrong question :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test plan for IBM Cloud is that QE on their side will verify that we have not regressed in the tweaks that have been made based on the IBMCloud platform. On our side, we don't yet have automated CI for Hypershift setups, but it's something that we're working towards.

@wking
Copy link
Member

wking commented Apr 26, 2021

Now that #752 has landed, checking vs. this PR:

/test markdownlint

@wking
Copy link
Member

wking commented Apr 26, 2021

CI cluster flake: the build src failed after 49s with reason OutOfMemoryKilled.

/retest

@wking
Copy link
Member

wking commented Apr 26, 2021

openshift/release#18081 should remove the memory limit we were overshooting.

/retest

@csrwng
Copy link
Contributor Author

csrwng commented Apr 26, 2021

@dhellmann the lint job passed, should I have seen an error about the missing sections?

@dhellmann
Copy link
Contributor

@dhellmann the lint job passed, should I have seen an error about the missing sections?

The lint job uses a script in the repo, so you'll need to rebase to see the errors.

wking added a commit to wking/openshift-enhancements that referenced this pull request Apr 26, 2021
CI apparently checks out the base branch and merges the pull request
branch before running tests [1]:

  INFO[2021-04-26T18:28:37Z] Resolved source https://github.com/openshift/enhancements to master@42b44d6e, merging: openshift#744 d8c0ef8 ...

So instead of using the local 'master' branch (which includes that new
merge), use origin/master to calculate changed files.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_enhancements/744/pull-ci-openshift-enhancements-master-markdownlint/1386749042607788032#1:build-log.txt%3A3
@wking
Copy link
Member

wking commented Apr 26, 2021

#755 landed.

/test markdownlint

@wking
Copy link
Member

wking commented Apr 27, 2021

#756 has landed.

/test markdownlint

@wking
Copy link
Member

wking commented Apr 27, 2021

Prow didn't notice.

/test markdownlint

@wking
Copy link
Member

wking commented Apr 27, 2021

Hooray (I guess ;):

INFO[2021-04-27T16:40:00Z] enhancements/external-control-plane-topology.md missing "### User Stories" 
INFO[2021-04-27T16:40:00Z] enhancements/external-control-plane-topology.md missing "### Risks and Mitigations" 
INFO[2021-04-27T16:40:00Z] enhancements/external-control-plane-topology.md missing "## Design Details" 
...

@csrwng
Copy link
Contributor Author

csrwng commented Apr 27, 2021

Will update later today :)

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all comments were addressed, and this is needed to differentiate Standalone OCP on IBM Cloud from Managed Control Planes on IBM Cloud (or other infrastructures).

/approve
/lgtm

- Allow expressing a new type of control plane topology in the Infrastructure resource inside an
OCP cluster.

- Provide operators/components that change their behavior based on whether the control plane is external or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp the list of components are useful, but really this is more saying 'there will never be a role=master' host in this cluster where set to External


- Provide a design for running OCP with an external control plane.

- Describe how the `controlPlaneTopology` field will be set for an external control plane deployment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking for clarification, the externalized control plane manager is responsible for projecting this value into the end-user cluster data plane.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 4, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@dhellmann
Copy link
Contributor

/hold

The lint job is failing because this document is missing a bunch of the required headers from the template. Please restore those sections and put something like N/A if they do not apply to this design.

+ hack/template-lint.sh
enhancements/external-control-plane-topology.md missing "### User Stories"
enhancements/external-control-plane-topology.md missing "### Risks and Mitigations"
enhancements/external-control-plane-topology.md missing "## Design Details"
enhancements/external-control-plane-topology.md missing "### Test Plan"
enhancements/external-control-plane-topology.md missing "### Graduation Criteria"
enhancements/external-control-plane-topology.md missing "#### Dev Preview -> Tech Preview"
enhancements/external-control-plane-topology.md missing "#### Tech Preview -> GA"
enhancements/external-control-plane-topology.md missing "#### Removing a deprecated feature"
enhancements/external-control-plane-topology.md missing "### Upgrade / Downgrade Strategy"
enhancements/external-control-plane-topology.md missing "### Version Skew Strategy"
enhancements/external-control-plane-topology.md missing "## Implementation History"
enhancements/external-control-plane-topology.md missing "## Drawbacks"
enhancements/external-control-plane-topology.md missing "## Alternatives"

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021

Existing components such as Console and Monitoring that use platform type of `IBMCloud` to modify their behavior
for external control planes would need to switch to `controlPlaneTopology` to determine whether the control plane
is external.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 18, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 18, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@csrwng
Copy link
Contributor Author

csrwng commented May 18, 2021

@dhellmann I've added the missing sections

@dhellmann
Copy link
Contributor

/hold cancel
/lgtm

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2021
@openshift-merge-robot openshift-merge-robot merged commit dd37ee5 into openshift:master May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.