-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPNET-282: Configure-ovs alternative implementation #4249
OPNET-282: Configure-ovs alternative implementation #4249
Conversation
@cybertron: This pull request references OPNET-282 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this on slack, but to summarize it here:
What we're essentially proposing is a dual management method where the user is expected to provide a MachineConfig object (used by ignition) to do provisioning during install time, and for new nodes. Day 2 configuration is done via NodeNetworkConfigurationPolicy objects which are not actually part of the core payload.
This is not ideal as it introduces some issues and antipatterns:
- some machineconfigs don't actually do anything if you update them, which is confusing for a user (I guess it technically does affect new nodes, but not existing ondes)
- to update a network configuration I'd have to edit both the API object and the machineconfig separately, which is not ideal
- we're doing oneshot configuration via bash via a template and a stamp file, which isn't really what machineconfigs should be used for (although I guess it technically is supported to do so)
So this can probably work but I think longer term we should plan on either adding in proper API + parsing or add a new CRD into openshift that can do the higher level management directly (so it translates to both machineconfigs and NodeNetworkConfigurationPolicy, for example)
@@ -419,6 +419,12 @@ func calculatePostConfigChangeActionFromFileDiffs(diffFileSet []string) (actions | |||
imageRegistryAuthFile, | |||
"/var/lib/kubelet/config.json", | |||
} | |||
directoriesPostConfigChangeActionNone := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note we're trying to move from a hard-coded list to an API here: openshift/api#1764
So directories isn't part of our initially supported plans but we should eventually add support for this cc @djoshy
fi | ||
|
||
hostname=$(hostname -s) | ||
hostname_file="/etc/nmstate/openshift/${hostname}.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, is it possible to add everything into one file and post process it later?
i.e. instead of having ip1.yaml ip2.yaml cluster.yaml, I have one big config.yaml, that then I specify within:
ip1.spec and ip2.spec
or something like that so it's a singular path, and the script itself processes when it does the copy/processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.
fi | ||
|
||
hostname=$(hostname -s) | ||
hostname_file="/etc/nmstate/openshift/${hostname}.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably some way we could do it. I'm not sure we have yq available on the host so I don't know if we can use that, but we could probably split the file with some bash magic.
/test e2e-metal-ipi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the networking aspect of this LGTM.
templates/common/_base/units/nmstate-configuration.service.yaml
Outdated
Show resolved
Hide resolved
# Clean up old config on behalf of mtu-migration | ||
if ! systemctl -q is-enabled mtu-migration; then | ||
echo "Cleaning up left over mtu migration configuration" | ||
rm -rf /etc/cno/mtu-migration | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this makes the check in configure-ovs dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it does if we leave this service enabled everywhere.
exit 0 | ||
fi | ||
|
||
# This logic is borrowed from configure-ovs.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this makes the check in configure-ovs dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case. We only run this if the new mechanism is used, so the configure-ovs version will still apply when we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. Why do you need to use /run/nodeip-configuration/primary-ip
specifically? If configure-ovs is doing it and it is platform agnostic, then we could basically use whatever it is using. I guess not problem if we limit this to baremetal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this couldn't work exactly the same way configure-ovs does because it's not all one service. In this case, we have nmstate-configuration.service providing configuration files for nmstate.service, then run wait-for-primary-ip.service to ensure we have a usable address.
The platform-agnostic solution is probably to split this into two services, one that runs before nmstate.service and stores off the primary IP, then this one that runs after to wait for it. I'll see how hard that would be to do now that we have a little more time to get this done.
pkg/daemon/update.go
Outdated
// We probably don't want to use this exact path because NMState is | ||
// planning to add a service that applies configs from it too, and we | ||
// want to make sure our service is the only one processing the configs. | ||
"/etc/nmstate/openshift", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why chose this dir vs, say, a directory out of a well known path? Potential for conflicts are low but I guess it would be less if we chose a different path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion on another location? I don't particularly care where the files are, but since this is OpenShift-specific NMState configuration I thought an openshift directory under /etc/nmstate made sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will note that we had a discussion about naming this something like "/etc/nmstate/techpreview-openshift" to make sure it's clear to anyone using this functionality that it is currently tech preview. We'll be documenting that too, but not everyone reads the docs, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already use paths like /etc/cno
and /etc/ovnk
. /etc/mco
also exists and I guess a potentially new /etc/openshift
comes to mind as well. If none of these seem plausible to you, then well I guess I don't have really any suggestion so we can go with what you have here.
4b29ad3
to
0cdd533
Compare
I just squashed some obsolete commits to make this easier to review. No functional changes yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
templates/common/_base/units/nmstate-configuration.service.yaml
Outdated
Show resolved
Hide resolved
if [ ! -e /etc/nmstate/openshift/applied ]; then | ||
# No need to do this if no NMState configuration was applied | ||
exit 0 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guard is not needed, at the second boot the IP will be already stored at NetworkManager and this script will be super fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't for second boot, it's to avoid running this when the new mechanism is not in use at all.
LGTM |
@cybertron: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
LGTM. |
Hi @cybertron I tried to build cluster with the PR via cluster-bot today, but got 'pull request #4249 needs to be rebased to branch master', could you please rebase it? Thanks a lot. |
Adds a service to apply NMState configs early in the boot process. If the new service applies changes we will skip running configure-ovs since br-ex should have been included in the manual configuration. Also makes the /etc/nmstate directory a reboot-less path so when we make changes to it while scaling out new nodes it won't cause all of the existing nodes to reboot. The files there are only applied once on initial boot anyway so there's no need to reboot for changes.
This significantly cleans up the logic.
For now this can't handle directories, so we need to make an exception for it.
For the initial implementation we've decided to target only baremetal since that's the main place where it is needed.
0cdd533
to
c02974b
Compare
Rebased and addressed most of the comments. I've added a couple of topics to the next sync meeting, but I think we move forward with this as-is and address the outstanding questions later. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving from the MCO perspective
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cybertron, qinqon, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-machine-config-operator-container-v4.17.0-202405070919.p0.g5a6e8b8.assembly.stream.el9 for distgit ose-machine-config-operator. |
Ah, sorry, I may have jumped the gun on the approval. @qiowang721 for any issues, maybe we can follow up as bugs? |
np, i opened a bug for it~ |
This adds a new service which allows manual configuration of br-ex using NMState. The advantage here versus configure-ovs is that you are able to explicitly configure the bridge exactly as you want from day 1, and also use NMState-based tools to modify it on day 2.
The configurations are provided as a machine-config provided to the installer as part of the manifests. The machine-config needs to contain either a cluster-wide or machine-specific NMState configuration that will be applied on first boot at approximately the same time configure-ovs would normally run. Machine-specific configs are mapped to their nodes using the hostname.
Additionally, this includes a couple of other meaningful changes. First is a wait-for-primary-ip service that runs after nmstate.service to ensure the IP is usable before we start crio. Otherwise crio may fail to start initially, which then blocks kubelet. The other change of note is that the directory used to store these new configurations is added to the rebootless config list. This is because these configs are only applied on first boot and changes to them would have no effect anyway. More importantly, when scaling out a cluster it is necessary to add configs for the new nodes and we don't want to have to reboot the entire existing cluster when we do that.
- What I did
- How to verify it
- Description for the changelog
Added nmstate-configuration service that provides an alternative to ovs-configuration for creation of br-ex.