OPNET-282: Configure-ovs alternative implementation #4249

cybertron · 2024-03-07T21:10:08Z

This adds a new service which allows manual configuration of br-ex using NMState. The advantage here versus configure-ovs is that you are able to explicitly configure the bridge exactly as you want from day 1, and also use NMState-based tools to modify it on day 2.

The configurations are provided as a machine-config provided to the installer as part of the manifests. The machine-config needs to contain either a cluster-wide or machine-specific NMState configuration that will be applied on first boot at approximately the same time configure-ovs would normally run. Machine-specific configs are mapped to their nodes using the hostname.

Additionally, this includes a couple of other meaningful changes. First is a wait-for-primary-ip service that runs after nmstate.service to ensure the IP is usable before we start crio. Otherwise crio may fail to start initially, which then blocks kubelet. The other change of note is that the directory used to store these new configurations is added to the rebootless config list. This is because these configs are only applied on first boot and changes to them would have no effect anyway. More importantly, when scaling out a cluster it is necessary to add configs for the new nodes and we don't want to have to reboot the entire existing cluster when we do that.

- What I did

- How to verify it

- Description for the changelog
Added nmstate-configuration service that provides an alternative to ovs-configuration for creation of br-ex.

openshift-ci-robot · 2024-03-07T21:10:11Z

@cybertron: This pull request references OPNET-282 which is a valid jira issue.

In response to this:

This adds a new service which allows manual configuration of br-ex using NMState. The advantage here versus configure-ovs is that you are able to explicitly configure the bridge exactly as you want from day 1, and also use NMState-based tools to modify it on day 2.

The configurations are provided as a machine-config provided to the installer as part of the manifests. The machine-config needs to contain either a cluster-wide or machine-specific NMState configuration that will be applied on first boot at approximately the same time configure-ovs would normally run. Machine-specific configs are mapped to their nodes using the hostname.

Additionally, this includes a couple of other meaningful changes. First is a wait-for-primary-ip service that runs after nmstate.service to ensure the IP is usable before we start crio. Otherwise crio may fail to start initially, which then blocks kubelet. The other change of note is that the directory used to store these new configurations is added to the rebootless config list. This is because these configs are only applied on first boot and changes to them would have no effect anyway. More importantly, when scaling out a cluster it is necessary to add configs for the new nodes and we don't want to have to reboot the entire existing cluster when we do that.

- What I did

- How to verify it

- Description for the changelog
Added nmstate-configuration service that provides an alternative to ovs-configuration for creation of br-ex.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-03-07T21:10:12Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

yuqi-zhang

Discussed this on slack, but to summarize it here:

What we're essentially proposing is a dual management method where the user is expected to provide a MachineConfig object (used by ignition) to do provisioning during install time, and for new nodes. Day 2 configuration is done via NodeNetworkConfigurationPolicy objects which are not actually part of the core payload.

This is not ideal as it introduces some issues and antipatterns:

some machineconfigs don't actually do anything if you update them, which is confusing for a user (I guess it technically does affect new nodes, but not existing ondes)
to update a network configuration I'd have to edit both the API object and the machineconfig separately, which is not ideal
we're doing oneshot configuration via bash via a template and a stamp file, which isn't really what machineconfigs should be used for (although I guess it technically is supported to do so)

So this can probably work but I think longer term we should plan on either adding in proper API + parsing or add a new CRD into openshift that can do the higher level management directly (so it translates to both machineconfigs and NodeNetworkConfigurationPolicy, for example)

yuqi-zhang · 2024-04-05T21:15:36Z

pkg/daemon/update.go

@@ -419,6 +419,12 @@ func calculatePostConfigChangeActionFromFileDiffs(diffFileSet []string) (actions
 		imageRegistryAuthFile,
 		"/var/lib/kubelet/config.json",
 	}
+	directoriesPostConfigChangeActionNone := []string{


Just as a note we're trying to move from a hard-coded list to an API here: openshift/api#1764

So directories isn't part of our initially supported plans but we should eventually add support for this cc @djoshy

yuqi-zhang · 2024-04-05T21:17:28Z

templates/common/_base/files/nmstate-configuration.yaml

+    fi
+
+    hostname=$(hostname -s)
+    hostname_file="/etc/nmstate/openshift/${hostname}.yml"


just curious, is it possible to add everything into one file and post process it later?

i.e. instead of having ip1.yaml ip2.yaml cluster.yaml, I have one big config.yaml, that then I specify within:
ip1.spec and ip2.spec

or something like that so it's a singular path, and the script itself processes when it does the copy/processing?

There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.

cybertron · 2024-04-10T21:25:22Z

templates/common/_base/files/nmstate-configuration.yaml

+    fi
+
+    hostname=$(hostname -s)
+    hostname_file="/etc/nmstate/openshift/${hostname}.yml"


There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.

pkg/daemon/update.go

cybertron · 2024-04-19T18:10:37Z

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-ipv6

jcaamano

In general, the networking aspect of this LGTM.

templates/common/_base/units/nmstate-configuration.service.yaml

jcaamano · 2024-04-24T10:05:12Z

templates/common/_base/files/nmstate-configuration.yaml

+    # Clean up old config on behalf of mtu-migration
+    if ! systemctl -q is-enabled mtu-migration; then
+      echo "Cleaning up left over mtu migration configuration"
+      rm -rf /etc/cno/mtu-migration
+    fi


I guess this makes the check in configure-ovs dead code?

I suppose it does if we leave this service enabled everywhere.

jcaamano · 2024-04-24T10:05:48Z

templates/common/_base/files/wait-for-primary-ip.yaml

+      exit 0
+    fi
+
+    # This logic is borrowed from configure-ovs.sh


I guess this makes the check in configure-ovs dead code?

Not in this case. We only run this if the new mechanism is used, so the configure-ovs version will still apply when we don't.

Ah ok. Why do you need to use /run/nodeip-configuration/primary-ip specifically? If configure-ovs is doing it and it is platform agnostic, then we could basically use whatever it is using. I guess not problem if we limit this to baremetal though.

I think this couldn't work exactly the same way configure-ovs does because it's not all one service. In this case, we have nmstate-configuration.service providing configuration files for nmstate.service, then run wait-for-primary-ip.service to ensure we have a usable address.

The platform-agnostic solution is probably to split this into two services, one that runs before nmstate.service and stores off the primary IP, then this one that runs after to wait for it. I'll see how hard that would be to do now that we have a little more time to get this done.

jcaamano · 2024-04-24T10:08:41Z

pkg/daemon/update.go

+		// We probably don't want to use this exact path because NMState is
+		// planning to add a service that applies configs from it too, and we
+		// want to make sure our service is the only one processing the configs.
+		"/etc/nmstate/openshift",


Why chose this dir vs, say, a directory out of a well known path? Potential for conflicts are low but I guess it would be less if we chose a different path.

Do you have a suggestion on another location? I don't particularly care where the files are, but since this is OpenShift-specific NMState configuration I thought an openshift directory under /etc/nmstate made sense.

I will note that we had a discussion about naming this something like "/etc/nmstate/techpreview-openshift" to make sure it's clear to anyone using this functionality that it is currently tech preview. We'll be documenting that too, but not everyone reads the docs, unfortunately.

We already use paths like /etc/cno and /etc/ovnk. /etc/mco also exists and I guess a potentially new /etc/openshift comes to mind as well. If none of these seem plausible to you, then well I guess I don't have really any suggestion so we can go with what you have here.

cybertron · 2024-04-24T14:58:11Z

I just squashed some obsolete commits to make this easier to review. No functional changes yet.

qinqon

nits

templates/common/_base/units/ovs-configuration.service.yaml

pkg/daemon/update.go

templates/common/_base/units/nmstate-configuration.service.yaml

templates/common/_base/files/nmstate-configuration.yaml

qinqon · 2024-04-24T15:34:19Z

templates/common/_base/files/wait-for-primary-ip.yaml

+    if [ ! -e /etc/nmstate/openshift/applied ]; then
+      # No need to do this if no NMState configuration was applied
+      exit 0
+    fi


The guard is not needed, at the second boot the IP will be already stored at NetworkManager and this script will be super fast.

This isn't for second boot, it's to avoid running this when the new mechanism is not in use at all.

templates/common/_base/files/nmstate-configuration.yaml

jcaamano · 2024-04-24T16:07:12Z

LGTM

openshift-ci · 2024-04-24T18:30:33Z

@cybertron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`0cdd533`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

qinqon · 2024-04-24T19:21:14Z

/lgtm

cgoncalves · 2024-04-24T19:38:34Z

LGTM.

templates/common/_base/files/nmstate-configuration.yaml

qiowang721 · 2024-05-06T01:53:15Z

Hi @cybertron I tried to build cluster with the PR via cluster-bot today, but got 'pull request #4249 needs to be rebased to branch master', could you please rebase it? Thanks a lot.

Adds a service to apply NMState configs early in the boot process. If the new service applies changes we will skip running configure-ovs since br-ex should have been included in the manual configuration. Also makes the /etc/nmstate directory a reboot-less path so when we make changes to it while scaling out new nodes it won't cause all of the existing nodes to reboot. The files there are only applied once on initial boot anyway so there's no need to reboot for changes.

This significantly cleans up the logic.

For now this can't handle directories, so we need to make an exception for it.

For the initial implementation we've decided to target only baremetal since that's the main place where it is needed.

cybertron · 2024-05-06T16:46:14Z

Rebased and addressed most of the comments. I've added a couple of topics to the next sync meeting, but I think we move forward with this as-is and address the outstanding questions later.

qinqon · 2024-05-06T18:47:25Z

/lgtm

yuqi-zhang

Approving from the MCO perspective

openshift-ci · 2024-05-07T00:29:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, qinqon, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

qiowang721 · 2024-05-07T04:09:00Z

/hold

openshift-bot · 2024-05-07T12:30:50Z

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-machine-config-operator-container-v4.17.0-202405070919.p0.g5a6e8b8.assembly.stream.el9 for distgit ose-machine-config-operator.
All builds following this will include this PR.

yuqi-zhang · 2024-05-07T13:54:04Z

Ah, sorry, I may have jumped the gun on the approval. @qiowang721 for any issues, maybe we can follow up as bugs?

qiowang721 · 2024-05-07T14:14:02Z

Ah, sorry, I may have jumped the gun on the approval. @qiowang721 for any issues, maybe we can follow up as bugs?

np, i opened a bug for it~

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 7, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2024

yuqi-zhang reviewed Apr 5, 2024

View reviewed changes

cybertron commented Apr 10, 2024

View reviewed changes

cybertron marked this pull request as ready for review April 19, 2024 15:02

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024

openshift-ci bot requested review from djoshy and dkhater-redhat April 19, 2024 15:07

jcaamano reviewed Apr 24, 2024

View reviewed changes

cybertron force-pushed the configure-ovs-alt branch from 4b29ad3 to 0cdd533 Compare April 24, 2024 14:57

qinqon suggested changes Apr 24, 2024

View reviewed changes

openshift-ci bot assigned qinqon Apr 24, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2024

qiowang721 reviewed Apr 30, 2024

View reviewed changes

templates/common/_base/files/nmstate-configuration.yaml Outdated Show resolved Hide resolved

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024

cybertron added 6 commits May 6, 2024 11:43

Tweak dependencies to reflect kubelet-dependencies

cad8c27

Add wait-for-primary-ip service

171223f

Change reboot-less path to /etc/nmstate/openshift

7a94ef8

Avoid overwriting existing files and cleanup mtu migration

16b8327

Use InSlice to check if file is in directory

16d796a

This significantly cleans up the logic.

cybertron added 3 commits May 6, 2024 11:43

Add exception to NodeDisruptionPolicy

397b18e

For now this can't handle directories, so we need to make an exception for it.

Add logging and fix logic in nmstate-configuration

addcb11

Enable nmstate-configuration only on baremetal

c02974b

For the initial implementation we've decided to target only baremetal since that's the main place where it is needed.

cybertron force-pushed the configure-ovs-alt branch from 0cdd533 to c02974b Compare May 6, 2024 16:43

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 6, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2024

yuqi-zhang approved these changes May 7, 2024

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2024

openshift-merge-bot bot merged commit 5a6e8b8 into openshift:master May 7, 2024
15 checks passed

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 7, 2024

OPNET-282: Configure-ovs alternative implementation #4249

OPNET-282: Configure-ovs alternative implementation #4249

Conversation

cybertron commented Mar 7, 2024

openshift-ci-robot commented Mar 7, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Mar 7, 2024

yuqi-zhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cybertron commented Apr 19, 2024

jcaamano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cybertron commented Apr 24, 2024

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcaamano commented Apr 24, 2024

openshift-ci bot commented Apr 24, 2024 • edited Loading

qinqon commented Apr 24, 2024

cgoncalves commented Apr 24, 2024

qiowang721 commented May 6, 2024

cybertron commented May 6, 2024

qinqon commented May 6, 2024

yuqi-zhang left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 7, 2024

qiowang721 commented May 7, 2024

openshift-bot commented May 7, 2024

yuqi-zhang commented May 7, 2024

qiowang721 commented May 7, 2024

openshift-ci-robot commented Mar 7, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Apr 24, 2024 •

edited

Loading