-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPNODE-1632: Support ClusterImagePolicy CRD #4160
OCPNODE-1632: Support ClusterImagePolicy CRD #4160
Conversation
Skipping CI for Draft Pull Request. |
@QiWang19: This pull request references OCPNODE-1632 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
d31d7ce
to
c571e3c
Compare
@QiWang19: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
c571e3c
to
5735b02
Compare
/retest-required |
@mtrmac This is ready for review. Could you take a look? |
@QiWang19: This pull request references OCPNODE-1632 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fairly brief look for now, mostly demonstrating my k8s ignorance. Looks reasonable to me overall.
I didn’t read the tests at all.
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
@@ -284,6 +308,62 @@ func (ctrl *Controller) itmsConfDeleted(_ interface{}) { | |||
ctrl.imgQueue.Add("openshift-config") | |||
} | |||
|
|||
func (ctrl *Controller) clusterImagePolicyAdded(_ interface{}) { | |||
if !ctrl.sigstoreAPIEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know, how does this usually work?
It seems to me that either the feature gates are constant throughout the life of the controller, and then we shouldn’t even be observing the CRs if the gate API is disabled; or they can change during the life of the controller, and then something should trigger a re-sync on feature gate change, and I can’t see that happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCO components use the default featuregateaccessor configuration,
machine-config-operator/pkg/controller/common/controller_context.go
Lines 122 to 128 in bcb59f0
featureGateAccessor := featuregates.NewFeatureGateAccess( | |
desiredVersion, missingVersion, | |
configSharedInformer.Config().V1().ClusterVersions(), configSharedInformer.Config().V1().FeatureGates(), | |
recorder, | |
) | |
go featureGateAccessor.Run(ctx) |
On restart, the feature gates will be up to date with the new level.
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
) | ||
switch policy.RootOfTrust.PolicyType { | ||
case apicfgv1alpha1.PublicKeyRootOfTrust: | ||
keyDataDec, err := b64.StdEncoding.DecodeString(policy.RootOfTrust.PublicKey.KeyData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explicit base64 handling surprises me. Wouldn’t it be possible to make the API type a []byte
? At least https://github.com/kubernetes/api/blob/f3648a53522eb60ea75d70d36a50c799f7e4e23b/core/v1/types.go#L6838 suggests that might work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return sigstorePolicyRequirement, nil | ||
} | ||
|
||
func validateClusterImagePolicyWithAllowedBlockedRegistries(clusterScopePolicies map[string]signature.PolicyRequirements, allowedRegs, policyBlocked []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idle thought… is this sufficient?
- If
example.com
is blocked, but a policy exists forexample.com/repo
, that effectively unblocks that namespace - If only
example.com
is allowed, but a policy exists forexample.invalid
, that effectively unblocks that namespace
It’s quite possible this was already discussed and designed in the enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enhancement only mentioned rejection if the scopes and allowed/blocked list are the same, it hasn't been discussed the case the scopes are nested inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac What are your thoughts on this?
- if blockedlist exists, the clusterimagepolicy scopes must not equal to or nested under blockedRegistries
- if allowedlist exists, the clusterimagepolicy scopes nested under the allowedRegistries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t really have a strong opinion here — on the one hand the behavior in the current version of the code is pretty surprising, OTOH handling conflicts by ~ignoring configuration and MCO just reporting something in a status is probably also not very intuitive, and minimizing the cases where configuration is ignored that way seems valuable.
But it also feels like an instance of a general design concern where the k8s universe should have a general best practice. I wouldn’t know what that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I didn’t see your last comment before)
- if blockedlist exists, the clusterimagepolicy scopes must not equal to or nested under blockedRegistries
- if allowedlist exists, the clusterimagepolicy scopes nested under the allowedRegistries
I agree this makes sense as a consistent configuration.
What happens when we go from a consistent configuration to an inconsistent one? E.g. supporting (allowed=A; CIP scopes = [A, A/repo]), and then an admin adds a new CIP scope B.
Is the effect that we retain the current constant MachineConfig
s, or would the code fall open and ignore everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the effect that we retain the current constant MachineConfigs, or would the code fall open and ignore everything?
retain the current constant MachineConfigs, the controller would fail to sync the new inconsistent CIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great :) In that case the extra checks proposed in #4160 (comment) look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ~15-minute look over the tests for now…
|
||
f.expectCreateMachineConfigAction(mcs2) | ||
f.expectGetMachineConfigAction(clusterimgPolicyMCs2) | ||
f.expectCreateMachineConfigAction(clusterimgPolicyMCs2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No expectation of a CIP access?
(I would also think that the existing tests should now start accessing CIP, when the feature gate is hard-coded to on. … and maybe there should be tests that the feature gate works.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current action verification tests only check the machineconfiguration client, it does not include the config, operator API client, and the list of resources intended to be filtered out. We can improve the test to include other client, but not in this PR.
Lines 309 to 312 in 53a9e00
// filterInformerActions filters list and watch actions for testing resources. | |
// Since list and watch don't change resource state we can filter it to lower | |
// noise level in our tests. | |
func filterInformerActions(actions []core.Action) []core.Action { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the featuregate tests https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/kubelet-config/kubelet_config_features_test.go
@@ -172,6 +172,7 @@ func createControllers(ctx *ctrlcommon.ControllerContext) []ctrlcommon.Controlle | |||
ctx.ConfigInformerFactory.Config().V1().Images(), | |||
ctx.ConfigInformerFactory.Config().V1().ImageDigestMirrorSets(), | |||
ctx.ConfigInformerFactory.Config().V1().ImageTagMirrorSets(), | |||
ctx.ConfigInformerFactory.Config().V1alpha1().ClusterImagePolicies(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only want to add this if the feature gate is enabled, else the informer when it starts will try to access an API type that does not exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoelSpeed This seems ok from the ci logs? around Starting MachineConfigController-ContainerRuntimeConfigController
, there's no accessing API error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W0214 19:33:25.385128 1 reflector.go:539] github.com/openshift/client-go/config/informers/externalversions/factory.go:125: failed to list *v1alpha1.ClusterImagePolicy: the server could not find the requested resource (get clusterimagepolicies.config.openshift.io)
E0214 19:33:25.385182 1 reflector.go:147] github.com/openshift/client-go/config/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.ClusterImagePolicy: failed to list *v1alpha1.ClusterImagePolicy: the server could not find the requested resource (get clusterimagepolicies.config.openshift.io)
This is repeated every 45 seconds through the logs showing the informer is having issues starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only want to add this if the feature gate is enabled, else the informer when it starts will try to access an API type that does not exist
Tried to fix this review by commit 2cc6e6d
The featuregete will have a value only when the (*Controller) Run()
is called (featuregate from start.go). So I added the ctrl.enableSigstoreAPIHandler()
inside the (*Controller) Run()
.
But The cluster failed to launch with techpreview turned on. e2e-gcp-op-techpreview logs did not log Starting MachineConfigController-ContainerRuntimeConfigController
after the featuregate observed as true. The containerruntimeconfig controller did not start. maybe blocked on L234 cache.WaitForCacheSync
Lines 230 to 238 in 2cc6e6d
if ctrl.sigstoreAPIEnabled() { | |
ctrl.enableSigstoreAPIHandler() | |
listerCaches = append(listerCaches, ctrl.clusterImagePolicyListerSynced) | |
} | |
if !cache.WaitForCacheSync(stopCh, listerCaches...) { | |
return | |
} | |
klog.Info("Starting MachineConfigController-ContainerRuntimeConfigController") |
If not add
listerCaches = append(listerCaches, ctrl.clusterImagePolicyListerSynced)
,the cluster can launch, but the controller did not react to clusterimagepolicy objects created, it seems the
ctrl.configInformerFactory.Config().V1alpha1().ClusterImagePolicies().Informer().AddEventHandler
did not add listener, but this call to AddEventHandler
did not return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoelSpeed From the finding above #4160 (comment), I am not sure how to add listener for the ClusterImagePolicy after the featuregate is enabled. Is it because I can not call AddEventHandler after the sharedInformer is running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should know from startup time whether the feature gate is enabled or not, the featuregateaccessor should be synced before you even get to registering your listeners, so, this should be a simple conditional logic around the listener initialisation and event handler adding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in the start.go, all controllers added listeners before detecting the featuregate.
I found I have to start the SharedInformerFactory again in the container runtime config controller Run()
to let the AddEventHandler take effect.
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
7dd1c3d
to
a414fe7
Compare
0fc07f0
to
5926a76
Compare
Right, I was mostly concerned about any merge conflicts, although it looks like #4295 reverted the above, so we'll have to get that back in... Although there might not have been any conflicts, so maybe we can ignore that 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass and overall structure makes sense to me. Leaving an approval for now (based on the comments, looks like we got the FG working for this, nice!)
Main concern inline is the conflict with NodeDisruptionPolicy object, we should definitely discuss this at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you copying these over by hand? I think these can be added to our hack/sync-crd script so they can be auto updated if the vendored content from API were to change
@@ -461,6 +464,8 @@ func calculatePostConfigChangeActionFromFileDiffs(diffFileSet []string) (actions | |||
actions = []string{postConfigChangeActionReloadCrio} | |||
} else if ctrlcommon.InSlice(path, filesPostConfigChangeActionRestartCrio) { | |||
actions = []string{postConfigChangeActionRestartCrio} | |||
} else if ctrlcommon.InSlice(filepath.Dir(path), dirsPostConfigChangeActionReloadCrio) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for the initial implementation of nodeDiruptionPolicy we don't actually allow wildcarding of paths like this (so all of these will eventually get converted to an API object)
I guess since that's tech preview we can always update the API or MCO processing to allow that but just a note that these are technically conflicting
5926a76
to
08908bd
Compare
@yuqi-zhang PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I can't speak to the details of the policies but the logic in bootstrap/containerruntime seems safe to me. Also since this is all behind FG, I'm fine with merging as is and we can follow up any concerns for GA
@@ -17,7 +17,7 @@ for crd in "${CRDS_MAPPING[@]}" ; do | |||
done | |||
|
|||
#this one goes in manifests rather than install, but should it? | |||
cp "vendor/github.com/openshift/api/config/v1alpha1/0000_10_config-operator_01_clusterimagepolicy-TechPreviewNoUpgrade.crd.yaml" "install/0000_10_config-operator_01_clusterimagepolicy-TechPreviewNoUpgrade.crd.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this might conflict with the MCO api PR (and the locations will be moving) but for now this should be ok
/retest-required |
08908bd
to
cf48c60
Compare
/lgtm retagging |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: QiWang19, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/override ci/prow/e2e-hypershift known failures due to CI outage. Has passed in the past |
@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-hypershift In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@QiWang19: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
642fa6b
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build ose-machine-config-operator-container-v4.16.0-202404120544.p0.g642fa6b.assembly.stream.el9 for distgit ose-machine-config-operator. |
- What I did
Follow the design from the enhancement. Add the implementation to ContainerruntimConfig controller for policy.json configuration.
- How to verify it
registry.build03.ci.openshift.org
did not show in thepolicy.json
because it is the release payload image repo.- Description for the changelog