Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPNET-282: Configure-ovs alternative implementation #4249

Merged
merged 9 commits into from
May 7, 2024

Conversation

cybertron
Copy link
Member

This adds a new service which allows manual configuration of br-ex using NMState. The advantage here versus configure-ovs is that you are able to explicitly configure the bridge exactly as you want from day 1, and also use NMState-based tools to modify it on day 2.

The configurations are provided as a machine-config provided to the installer as part of the manifests. The machine-config needs to contain either a cluster-wide or machine-specific NMState configuration that will be applied on first boot at approximately the same time configure-ovs would normally run. Machine-specific configs are mapped to their nodes using the hostname.

Additionally, this includes a couple of other meaningful changes. First is a wait-for-primary-ip service that runs after nmstate.service to ensure the IP is usable before we start crio. Otherwise crio may fail to start initially, which then blocks kubelet. The other change of note is that the directory used to store these new configurations is added to the rebootless config list. This is because these configs are only applied on first boot and changes to them would have no effect anyway. More importantly, when scaling out a cluster it is necessary to add configs for the new nodes and we don't want to have to reboot the entire existing cluster when we do that.

- What I did

- How to verify it

- Description for the changelog
Added nmstate-configuration service that provides an alternative to ovs-configuration for creation of br-ex.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 7, 2024

@cybertron: This pull request references OPNET-282 which is a valid jira issue.

In response to this:

This adds a new service which allows manual configuration of br-ex using NMState. The advantage here versus configure-ovs is that you are able to explicitly configure the bridge exactly as you want from day 1, and also use NMState-based tools to modify it on day 2.

The configurations are provided as a machine-config provided to the installer as part of the manifests. The machine-config needs to contain either a cluster-wide or machine-specific NMState configuration that will be applied on first boot at approximately the same time configure-ovs would normally run. Machine-specific configs are mapped to their nodes using the hostname.

Additionally, this includes a couple of other meaningful changes. First is a wait-for-primary-ip service that runs after nmstate.service to ensure the IP is usable before we start crio. Otherwise crio may fail to start initially, which then blocks kubelet. The other change of note is that the directory used to store these new configurations is added to the rebootless config list. This is because these configs are only applied on first boot and changes to them would have no effect anyway. More importantly, when scaling out a cluster it is necessary to add configs for the new nodes and we don't want to have to reboot the entire existing cluster when we do that.

- What I did

- How to verify it

- Description for the changelog
Added nmstate-configuration service that provides an alternative to ovs-configuration for creation of br-ex.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 7, 2024
Copy link
Contributor

openshift-ci bot commented Mar 7, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2024
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this on slack, but to summarize it here:

What we're essentially proposing is a dual management method where the user is expected to provide a MachineConfig object (used by ignition) to do provisioning during install time, and for new nodes. Day 2 configuration is done via NodeNetworkConfigurationPolicy objects which are not actually part of the core payload.

This is not ideal as it introduces some issues and antipatterns:

  1. some machineconfigs don't actually do anything if you update them, which is confusing for a user (I guess it technically does affect new nodes, but not existing ondes)
  2. to update a network configuration I'd have to edit both the API object and the machineconfig separately, which is not ideal
  3. we're doing oneshot configuration via bash via a template and a stamp file, which isn't really what machineconfigs should be used for (although I guess it technically is supported to do so)

So this can probably work but I think longer term we should plan on either adding in proper API + parsing or add a new CRD into openshift that can do the higher level management directly (so it translates to both machineconfigs and NodeNetworkConfigurationPolicy, for example)

@@ -419,6 +419,12 @@ func calculatePostConfigChangeActionFromFileDiffs(diffFileSet []string) (actions
imageRegistryAuthFile,
"/var/lib/kubelet/config.json",
}
directoriesPostConfigChangeActionNone := []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note we're trying to move from a hard-coded list to an API here: openshift/api#1764

So directories isn't part of our initially supported plans but we should eventually add support for this cc @djoshy

fi

hostname=$(hostname -s)
hostname_file="/etc/nmstate/openshift/${hostname}.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, is it possible to add everything into one file and post process it later?

i.e. instead of having ip1.yaml ip2.yaml cluster.yaml, I have one big config.yaml, that then I specify within:
ip1.spec and ip2.spec

or something like that so it's a singular path, and the script itself processes when it does the copy/processing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.

fi

hostname=$(hostname -s)
hostname_file="/etc/nmstate/openshift/${hostname}.yml"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.

pkg/daemon/update.go Outdated Show resolved Hide resolved
@cybertron cybertron marked this pull request as ready for review April 19, 2024 15:02
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@cybertron
Copy link
Member Author

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-ipv6

Copy link
Contributor

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the networking aspect of this LGTM.

Comment on lines 8 to 12
# Clean up old config on behalf of mtu-migration
if ! systemctl -q is-enabled mtu-migration; then
echo "Cleaning up left over mtu migration configuration"
rm -rf /etc/cno/mtu-migration
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this makes the check in configure-ovs dead code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it does if we leave this service enabled everywhere.

exit 0
fi

# This logic is borrowed from configure-ovs.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this makes the check in configure-ovs dead code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this case. We only run this if the new mechanism is used, so the configure-ovs version will still apply when we don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Why do you need to use /run/nodeip-configuration/primary-ip specifically? If configure-ovs is doing it and it is platform agnostic, then we could basically use whatever it is using. I guess not problem if we limit this to baremetal though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this couldn't work exactly the same way configure-ovs does because it's not all one service. In this case, we have nmstate-configuration.service providing configuration files for nmstate.service, then run wait-for-primary-ip.service to ensure we have a usable address.

The platform-agnostic solution is probably to split this into two services, one that runs before nmstate.service and stores off the primary IP, then this one that runs after to wait for it. I'll see how hard that would be to do now that we have a little more time to get this done.

// We probably don't want to use this exact path because NMState is
// planning to add a service that applies configs from it too, and we
// want to make sure our service is the only one processing the configs.
"/etc/nmstate/openshift",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why chose this dir vs, say, a directory out of a well known path? Potential for conflicts are low but I guess it would be less if we chose a different path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion on another location? I don't particularly care where the files are, but since this is OpenShift-specific NMState configuration I thought an openshift directory under /etc/nmstate made sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note that we had a discussion about naming this something like "/etc/nmstate/techpreview-openshift" to make sure it's clear to anyone using this functionality that it is currently tech preview. We'll be documenting that too, but not everyone reads the docs, unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use paths like /etc/cno and /etc/ovnk. /etc/mco also exists and I guess a potentially new /etc/openshift comes to mind as well. If none of these seem plausible to you, then well I guess I don't have really any suggestion so we can go with what you have here.

@cybertron
Copy link
Member Author

I just squashed some obsolete commits to make this easier to review. No functional changes yet.

Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

pkg/daemon/update.go Outdated Show resolved Hide resolved
templates/common/_base/files/nmstate-configuration.yaml Outdated Show resolved Hide resolved
templates/common/_base/files/nmstate-configuration.yaml Outdated Show resolved Hide resolved
Comment on lines +8 to +11
if [ ! -e /etc/nmstate/openshift/applied ]; then
# No need to do this if no NMState configuration was applied
exit 0
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard is not needed, at the second boot the IP will be already stored at NetworkManager and this script will be super fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't for second boot, it's to avoid running this when the new mechanism is not in use at all.

templates/common/_base/files/nmstate-configuration.yaml Outdated Show resolved Hide resolved
@jcaamano
Copy link
Contributor

LGTM

Copy link
Contributor

openshift-ci bot commented Apr 24, 2024

@cybertron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 0cdd533 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@qinqon
Copy link
Contributor

qinqon commented Apr 24, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2024
@cgoncalves
Copy link

LGTM.

@qiowang721
Copy link

Hi @cybertron I tried to build cluster with the PR via cluster-bot today, but got 'pull request #4249 needs to be rebased to branch master', could you please rebase it? Thanks a lot.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024
Adds a service to apply NMState configs early in the boot process.
If the new service applies changes we will skip running configure-ovs
since br-ex should have been included in the manual configuration.

Also makes the /etc/nmstate directory a reboot-less path so when we
make changes to it while scaling out new nodes it won't cause all
of the existing nodes to reboot. The files there are only applied
once on initial boot anyway so there's no need to reboot for changes.
This significantly cleans up the logic.
For now this can't handle directories, so we need to make an
exception for it.
For the initial implementation we've decided to target only baremetal
since that's the main place where it is needed.
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 6, 2024
@cybertron
Copy link
Member Author

Rebased and addressed most of the comments. I've added a couple of topics to the next sync meeting, but I think we move forward with this as-is and address the outstanding questions later.

@qinqon
Copy link
Contributor

qinqon commented May 6, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2024
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving from the MCO perspective

Copy link
Contributor

openshift-ci bot commented May 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, qinqon, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 5a6e8b8 into openshift:master May 7, 2024
15 checks passed
@qiowang721
Copy link

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 7, 2024
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-machine-config-operator-container-v4.17.0-202405070919.p0.g5a6e8b8.assembly.stream.el9 for distgit ose-machine-config-operator.
All builds following this will include this PR.

@yuqi-zhang
Copy link
Contributor

Ah, sorry, I may have jumped the gun on the approval. @qiowang721 for any issues, maybe we can follow up as bugs?

@qiowang721
Copy link

Ah, sorry, I may have jumped the gun on the approval. @qiowang721 for any issues, maybe we can follow up as bugs?

np, i opened a bug for it~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants