Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2020107: Remove run-level label #623

Merged
merged 1 commit into from
Dec 3, 2021
Merged

Bug 2020107: Remove run-level label #623

merged 1 commit into from
Dec 3, 2021

Conversation

mcoops
Copy link
Contributor

@mcoops mcoops commented Jul 7, 2021

Given the original commit for this was in 2018, it might be possible to remove the label now entirely. Given #24 is specifically setting it a dependency on the openshift-apiserver, I doubt it,

Will use this PR for testing and further discussion.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 7, 2021
@mcoops
Copy link
Contributor Author

mcoops commented Jul 7, 2021

/retest

1 similar comment
@mcoops
Copy link
Contributor Author

mcoops commented Jul 8, 2021

/retest

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2021
@mcoops
Copy link
Contributor Author

mcoops commented Nov 3, 2021

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021
@mcoops mcoops changed the title WIP: Remove run-level label Remove run-level label Nov 4, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2021
@wking
Copy link
Member

wking commented Nov 4, 2021

Background here. Install doesn't seem much slower in the e2e-agnostic presubmit that installs with the new code:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic/1412624472652910592/artifacts/e2e-agnostic/ipi-install-install/artifacts/.openshift_install.log | tail
time="2021-07-07T04:54:29Z" level=info msg="Install complete!"
time="2021-07-07T04:54:29Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/tmp/installer/auth/kubeconfig'"
time="2021-07-07T04:54:29Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.ci-op-jjir8t9y-3302f.ci.azure.devcluster.openshift.com"
time="2021-07-07T04:54:29Z" level=info msg="Login to the console with user: \"kubeadmin\", and password: REDACTED
time="2021-07-07T04:54:29Z" level=debug msg="Time elapsed per stage:"
time="2021-07-07T04:54:29Z" level=debug msg="                  : 16m45s"
time="2021-07-07T04:54:29Z" level=debug msg="Bootstrap Complete: 7m32s"
time="2021-07-07T04:54:29Z" level=debug msg=" Bootstrap Destroy: 4m49s"
time="2021-07-07T04:54:29Z" level=debug msg=" Cluster Operators: 11m31s"
time="2021-07-07T04:54:29Z" level=info msg="Time elapsed: 40m42s"

Although that's ~3m slower than the old code used for the e2e-agnostic-upgrade presubmit:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1412961787673841664/artifacts/e2e-agnostic-upgrade/ipi-install-install-stableinitial/artifacts/.openshift_install.log | tail
time="2021-07-08T03:21:40Z" level=info msg="Install complete!"
time="2021-07-08T03:21:40Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/tmp/installer/auth/kubeconfig'"
time="2021-07-08T03:21:40Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.ci-op-26hc2j3r-7ee27.ci.azure.devcluster.openshift.com"
time="2021-07-08T03:21:40Z" level=info msg="Login to the console with user: \"kubeadmin\", and password: REDACTED
time="2021-07-08T03:21:40Z" level=debug msg="Time elapsed per stage:"
time="2021-07-08T03:21:40Z" level=debug msg="                  : 14m56s"
time="2021-07-08T03:21:40Z" level=debug msg="Bootstrap Complete: 6m59s"
time="2021-07-08T03:21:40Z" level=debug msg=" Bootstrap Destroy: 4m54s"
time="2021-07-08T03:21:40Z" level=debug msg=" Cluster Operators: 16m52s"
time="2021-07-08T03:21:40Z" level=info msg="Time elapsed: 43m47s"

I'm not sure if the slowdown is statistically significant or a fluke.

Also, the CVO requires labels from the manifest to exist in the in-cluster resource, but we do not clear unrecognised labels, so the update presubmit still has them after updating to the patched release:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1412961787673841664/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/namespaces.json | jq '.items[].metadata | select(.name == "openshift-cluster-version").labels'
{
  "kubernetes.io/metadata.name": "openshift-cluster-version",
  "name": "openshift-cluster-version",
  "olm.operatorgroup.uid/0e8650ee-d36e-47bf-bdb3-b48357056c6b": "",
  "openshift.io/cluster-monitoring": "true",
  "openshift.io/run-level": "1"
}

Two questions:

  1. Is that label's presence a problem? Or is this label just a bootstrap-time thing?
  2. If it's a problem, is there an explicit value we can set for the label that says "don't give me any run-level handling; I'm fine without it"?

@wking wking changed the title Remove run-level label Bug 2020107: Remove run-level label Nov 4, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2021

@mcoops: This pull request references Bugzilla bug 2020107, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

In response to this:

Bug 2020107: Remove run-level label

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Nov 4, 2021
@openshift-ci openshift-ci bot requested review from jianlinliu and removed request for abhinavdahiya November 4, 2021 04:11
@mcoops
Copy link
Contributor Author

mcoops commented Nov 4, 2021

Two questions:

  1. Is that label's presence a problem? Or is this label just a bootstrap-time thing?
  2. If it's a problem, is there an explicit value we can set for the label that says "don't give me any run-level handling; I'm fine without it"?
  1. It would be good to remove the label entirely. Originally it was just a bootstrap time thing, but it also means no Security Context Constraint is applied at all. Even tho without the runlevel it gets admitted with hostaccess, that means it at least requires the pod to be run with a UID and gets a SELinux context. Better than nothing.

  2. That's a good point, dang. In theory if we set it to anything but a 0 or 1, then the code should still apply an SCC: https://github.com/openshift/origin/blob/0104fb51cb31e1f5920b778b17eec8b3286eefee/vendor/k8s.io/kubernetes/openshift-kube-apiserver/admission/namespaceconditions/labelcondition.go#L26
    But then it's kind of messy. On the other hand, its no worse than the status quo at the moment, the issue is just existing customers who upgrade wont get the benefit of this change. What do you think @deads2k?

@wking
Copy link
Member

wking commented Nov 4, 2021

openshift.io/run-level: ooh, yeah, gimme those security context constraints ;)

@deads2k
Copy link
Contributor

deads2k commented Nov 8, 2021

I suggest building a way to clear the label or annotation. We do it in library-go using a trailing - as the signal to remove to match oc label and oc annotate.

@mcoops mcoops changed the title Bug 2020107: Remove run-level label [WIP] Bug 2020107: Remove run-level label Nov 9, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 9, 2021
@mcoops
Copy link
Contributor Author

mcoops commented Nov 9, 2021

/retest

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 23, 2021
@mcoops
Copy link
Contributor Author

mcoops commented Nov 23, 2021

/retest

Given the original commit for this was in 2018, it might be possible to
remove the label now entirely. However, when doing an upgrade it won't
not be applied, hence any clusters which are upgraded still get the
run-level. This effectively unsets it, so works for installs and
upgrades.

Signed-off-by: coops <cooper.d.mark@gmail.com>
@mcoops
Copy link
Contributor Author

mcoops commented Nov 26, 2021

/retest e2e-agnostic

@mcoops
Copy link
Contributor Author

mcoops commented Nov 26, 2021

/test e2e-agnostic

@mcoops mcoops changed the title [WIP] Bug 2020107: Remove run-level label Bug 2020107: Remove run-level label Nov 28, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2021
@mcoops
Copy link
Contributor Author

mcoops commented Nov 28, 2021

/test e2e-agnostic-upgrade

@mcoops
Copy link
Contributor Author

mcoops commented Nov 29, 2021

I had a quick look at implementing something similar to: openshift/library-go#727, however as we'd need to specify openshift.io/run-level-: that's actually an invalid label so it never gets created failing with an invalid label error. We also don't particularly want to code specifically for the upgrade or make the CVO aware of run-levels if we can avoid it.

So I think it might be just easier at the moment to set it to an empty string with a comment. That then ensures that on a new install it doesn't install with a run-level set, and then on update it also unsets it. Otherwise if we just remove it from the manifest it only removes it from a new new cluster install.

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic/1464219753190002688/artifacts/e2e-agnostic/gather-extra/artifacts/namespaces.json | jq '.items[].metadata | select(.name == "openshift-cluster-version").labels'
{
  "kubernetes.io/metadata.name": "openshift-cluster-version",
  "name": "openshift-cluster-version",
  "olm.operatorgroup.uid/b0aeeb7a-918d-4b76-893d-856f61f4bac9": "",
  "openshift.io/cluster-monitoring": "true",
  "openshift.io/run-level": "",
  "pod-security.kubernetes.io/audit": "privileged",
  "pod-security.kubernetes.io/enforce": "privileged",
  "pod-security.kubernetes.io/warn": "privileged"
}

Then on upgrade:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1465099749160914944/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/namespaces.json | jq '.items[].metadata | select(.name == "openshift-cluster-version").labels'
{
  "kubernetes.io/metadata.name": "openshift-cluster-version",
  "name": "openshift-cluster-version",
  "olm.operatorgroup.uid/0d609421-4ec9-4b0c-b3ee-37ef0c060fca": "",
  "openshift.io/cluster-monitoring": "true",
  "openshift.io/run-level": "",
  "pod-security.kubernetes.io/audit": "privileged",
  "pod-security.kubernetes.io/enforce": "privileged",
  "pod-security.kubernetes.io/warn": "privileged"
}

And admits with an SCC assigned:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/623/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1465099749160914944/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/pods.json | jq '.items[].metadata | select(.name | startswith("cluster-version-operator-")).annotations'
{
  "openshift.io/scc": "hostaccess"
}

Also matches what the MCO is now doing: openshift/machine-config-operator#2655

Although I got no idea what's wrong with the tests @wking?

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

We can come back and check install durations in a week once we have a larger sample size, to gauge any slowdowns.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mcoops, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2021

@mcoops: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/golangci-lint 74d8547 link true /test golangci-lint
ci/prow/gofmt 74d8547 link true /test gofmt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit b81272d into openshift:master Dec 3, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2021

@mcoops: All pull requests linked via external trackers have merged:

Bugzilla bug 2020107 has been moved to the MODIFIED state.

In response to this:

Bug 2020107: Remove run-level label

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mcoops mcoops deleted the remove_runlevel_label branch December 3, 2021 14:45
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 29, 2022
This blocks us from being associated with SecurityContextConstraints
that set 'readOnlyRootFilesystem: true', because from [1]:

> The set of SCCs that admission uses to authorize a pod are
> determined by the user identity and groups that the user belongs
> to.  Additionally, if the pod specifies a service account, the set of
> allowable SCCs includes any constraints accessible to the service
> account.
>
> Admission uses the following approach to create the final security
> context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not
>    specified on the request.
> 3. Validate the final settings against the available constraints.

If we leave readOnlyRootFilesystem implicit, we may get associated
with a SCC that sed 'readOnlyRootFilesystem: true', and the version-*
actions will fail like [2]:

  $ oc -n openshift-cluster-version get pods
  NAME                                        READY   STATUS    RESTARTS   AGE
  cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
  version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
  $ oc -n openshift-cluster-version logs version-4.10.20-smvt9-6vqwc
  oc logs version-4.10.20-smvt9-6vqwc
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
  ...

For a similar change in another repository, see [3].

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/*
annotations [4] and cleared the openshift.io/run-level annotation [5].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.
f58dd1c (origin/pr/686) install: Add description annotations to manifests
6e5e23e (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
7009736 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before
it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.

[1]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2110590#c0
[3]: openshift/cluster-openshift-apiserver-operator#437
[4]: openshift#668
[5]: openshift#623
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 29, 2022
This blocks us from being associated with SecurityContextConstraints
that set 'readOnlyRootFilesystem: true', because from [1]:

> The set of SCCs that admission uses to authorize a pod are
> determined by the user identity and groups that the user belongs
> to.  Additionally, if the pod specifies a service account, the set of
> allowable SCCs includes any constraints accessible to the service
> account.
>
> Admission uses the following approach to create the final security
> context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not
>    specified on the request.
> 3. Validate the final settings against the available constraints.

If we leave readOnlyRootFilesystem implicit, we may get associated
with a SCC that set 'readOnlyRootFilesystem: true', and the version-*
actions will fail like [2]:

  $ oc -n openshift-cluster-version get pods
  NAME                                        READY   STATUS    RESTARTS   AGE
  cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
  version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
  $ oc -n openshift-cluster-version logs version-4.10.20-smvt9-6vqwc
  oc logs version-4.10.20-smvt9-6vqwc
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
  ...

For a similar change in another repository, see [3].

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/*
annotations [4] and cleared the openshift.io/run-level annotation [5].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.
f58dd1c (origin/pr/686) install: Add description annotations to manifests
6e5e23e (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
7009736 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before
it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.

[1]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2110590#c0
[3]: openshift/cluster-openshift-apiserver-operator#437
[4]: openshift#668
[5]: openshift#623
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 29, 2022
This blocks us from being associated with SecurityContextConstraints
that set 'readOnlyRootFilesystem: true', because from [1]:

> The set of SCCs that admission uses to authorize a pod are
> determined by the user identity and groups that the user belongs
> to.  Additionally, if the pod specifies a service account, the set of
> allowable SCCs includes any constraints accessible to the service
> account.
>
> Admission uses the following approach to create the final security
> context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not
>    specified on the request.
> 3. Validate the final settings against the available constraints.

If we leave readOnlyRootFilesystem implicit, we may get associated
with a SCC that set 'readOnlyRootFilesystem: true', and the version-*
actions will fail like [2]:

  $ oc -n openshift-cluster-version get pods
  NAME                                        READY   STATUS    RESTARTS   AGE
  cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
  version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
  $ oc -n openshift-cluster-version logs version-4.10.20-smvt9-6vqwc
  oc logs version-4.10.20-smvt9-6vqwc
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
  ...

For a similar change in another repository, see [3].

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/*
annotations [4] and cleared the openshift.io/run-level annotation [5].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.
f58dd1c (origin/pr/686) install: Add description annotations to manifests
6e5e23e (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
7009736 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before
it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.

[1]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2110590#c0
[3]: openshift/cluster-openshift-apiserver-operator#437
[4]: openshift#668
[5]: openshift#623
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Aug 2, 2022
This blocks us from being associated with SecurityContextConstraints
that set 'readOnlyRootFilesystem: true', because from [1]:

> The set of SCCs that admission uses to authorize a pod are
> determined by the user identity and groups that the user belongs
> to.  Additionally, if the pod specifies a service account, the set of
> allowable SCCs includes any constraints accessible to the service
> account.
>
> Admission uses the following approach to create the final security
> context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not
>    specified on the request.
> 3. Validate the final settings against the available constraints.

If we leave readOnlyRootFilesystem implicit, we may get associated
with a SCC that set 'readOnlyRootFilesystem: true', and the version-*
actions will fail like [2]:

  $ oc -n openshift-cluster-version get pods
  NAME                                        READY   STATUS    RESTARTS   AGE
  cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
  version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
  $ oc -n openshift-cluster-version logs version-4.10.20-smvt9-6vqwc
  oc logs version-4.10.20-smvt9-6vqwc
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
  ...

For a similar change in another repository, see [3].

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/*
annotations [4] and cleared the openshift.io/run-level annotation [5].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.
f58dd1c (origin/pr/686) install: Add description annotations to manifests
6e5e23e (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
7009736 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before
it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.

[1]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2110590#c0
[3]: openshift/cluster-openshift-apiserver-operator#437
[4]: openshift#668
[5]: openshift#623
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Aug 18, 2022
This blocks us from being associated with SecurityContextConstraints
that set 'readOnlyRootFilesystem: true', because from [1]:

> The set of SCCs that admission uses to authorize a pod are
> determined by the user identity and groups that the user belongs
> to.  Additionally, if the pod specifies a service account, the set of
> allowable SCCs includes any constraints accessible to the service
> account.
>
> Admission uses the following approach to create the final security
> context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not
>    specified on the request.
> 3. Validate the final settings against the available constraints.

If we leave readOnlyRootFilesystem implicit, we may get associated
with a SCC that set 'readOnlyRootFilesystem: true', and the version-*
actions will fail like [2]:

  $ oc -n openshift-cluster-version get pods
  NAME                                        READY   STATUS    RESTARTS   AGE
  cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
  version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
  $ oc -n openshift-cluster-version logs version-4.10.20-smvt9-6vqwc
  oc logs version-4.10.20-smvt9-6vqwc
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
  mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
  ...

For a similar change in another repository, see [3].

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/*
annotations [4] and cleared the openshift.io/run-level annotation [5].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.
f58dd1c (origin/pr/686) install: Add description annotations to manifests
6e5e23e (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
7009736 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before
it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e944 (origin/pr/623) Fix run-level label to empty string.

[1]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2110590#c0
[3]: openshift/cluster-openshift-apiserver-operator#437
[4]: openshift#668
[5]: openshift#623
wking added a commit to wking/insights-operator that referenced this pull request Sep 6, 2022
The annotation was dropped back in:

  $ git --no-pager log -1 --oneline 75f34c7
  75f34c7 manifests: Remove run-level, insights operator does not need it

That landed between 4.2 and 4.3:

  $ git --no-pager grep openshift.io/run-level origin/release-4.2 -- manifests
  origin/release-4.2:manifests/02-namespace.yaml:    openshift.io/run-level: "1"
  $ git --no-pager grep openshift.io/run-level origin/release-4.3 -- manifests
  ...no hits...

So clusters which were born in 4.1 or 4.2 may have the old annotation
still in place.  This commit clears it like
openshift/cluster-version-operator@539e944920 (Fix run-level label to
empty string, 2021-07-07, openshift/cluster-version-operator#623), so
the cluster-version operator will clear the stale annotation.
wking added a commit to wking/insights-operator that referenced this pull request Sep 6, 2022
The label was dropped back in:

  $ git --no-pager log -1 --oneline 75f34c7
  75f34c7 manifests: Remove run-level, insights operator does not need it

That landed between 4.2 and 4.3:

  $ git --no-pager grep openshift.io/run-level origin/release-4.2 -- manifests
  origin/release-4.2:manifests/02-namespace.yaml:    openshift.io/run-level: "1"
  $ git --no-pager grep openshift.io/run-level origin/release-4.3 -- manifests
  ...no hits...

So clusters which were born in 4.1 or 4.2 may have the old label
still in place.  This commit clears it like
openshift/cluster-version-operator@539e944920 (Fix run-level label to
empty string, 2021-07-07, openshift/cluster-version-operator#623), so
the cluster-version operator will clear the stale label.
openshift-merge-robot pushed a commit to openshift/insights-operator that referenced this pull request Sep 7, 2022
The label was dropped back in:

  $ git --no-pager log -1 --oneline 75f34c7
  75f34c7 manifests: Remove run-level, insights operator does not need it

That landed between 4.2 and 4.3:

  $ git --no-pager grep openshift.io/run-level origin/release-4.2 -- manifests
  origin/release-4.2:manifests/02-namespace.yaml:    openshift.io/run-level: "1"
  $ git --no-pager grep openshift.io/run-level origin/release-4.3 -- manifests
  ...no hits...

So clusters which were born in 4.1 or 4.2 may have the old label
still in place.  This commit clears it like
openshift/cluster-version-operator@539e944920 (Fix run-level label to
empty string, 2021-07-07, openshift/cluster-version-operator#623), so
the cluster-version operator will clear the stale label.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/insights-operator that referenced this pull request Sep 12, 2022
The label was dropped back in:

  $ git --no-pager log -1 --oneline 75f34c7
  75f34c7 manifests: Remove run-level, insights operator does not need it

That landed between 4.2 and 4.3:

  $ git --no-pager grep openshift.io/run-level origin/release-4.2 -- manifests
  origin/release-4.2:manifests/02-namespace.yaml:    openshift.io/run-level: "1"
  $ git --no-pager grep openshift.io/run-level origin/release-4.3 -- manifests
  ...no hits...

So clusters which were born in 4.1 or 4.2 may have the old label
still in place.  This commit clears it like
openshift/cluster-version-operator@539e944920 (Fix run-level label to
empty string, 2021-07-07, openshift/cluster-version-operator#623), so
the cluster-version operator will clear the stale label.
openshift-merge-robot pushed a commit to openshift/insights-operator that referenced this pull request Sep 13, 2022
The label was dropped back in:

  $ git --no-pager log -1 --oneline 75f34c7
  75f34c7 manifests: Remove run-level, insights operator does not need it

That landed between 4.2 and 4.3:

  $ git --no-pager grep openshift.io/run-level origin/release-4.2 -- manifests
  origin/release-4.2:manifests/02-namespace.yaml:    openshift.io/run-level: "1"
  $ git --no-pager grep openshift.io/run-level origin/release-4.3 -- manifests
  ...no hits...

So clusters which were born in 4.1 or 4.2 may have the old label
still in place.  This commit clears it like
openshift/cluster-version-operator@539e944920 (Fix run-level label to
empty string, 2021-07-07, openshift/cluster-version-operator#623), so
the cluster-version operator will clear the stale label.

Co-authored-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants