Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

tmckayus · 2019-07-12T14:42:38Z

Background: changes to selectors/labels on deployments.apps (and other objects) are not allowed by kubernetes, for the discussion behind that decision see kubernetes/kubernetes#50808.

During a "replace" operation on a CSV, the OLM will attempt to update/patch an existing deployment if the new CSV uses the same deployment name as the existing CSV. If the new CSV includes changes to the deployment(s) that are not allowed by kubernetes (ie a change to a selector/label field, for example), the replace operation will fail ("field is immutable" from kube). At this point, the OLM will periodically retry the replace, failing each time -- the old CSV will never successfully be replaced (but it will stay functional afaik)

If possible, this should be handled in a way that is more clear to an end user. Currently, it may not be clear to someone who isn't an advanced kubernetes/OLM user exactly what is happening and how to correct it.

From a user perspective, there are a couple of simple workarounds to this scenario:

simply respin the CSV and use a different deployment name (long term solution, covers everybody going forward). This is more of a developer solution, during the crafting of a new pull request to update the CSV.
delete the deployment(s) from the old CSV by hand. This will allow the current replace operation to continue. This is an immediate fix for the instance, more of an end user hot-fix.

High-level steps to reproduce

Create/use a CSV for any old operator
Create a "new" CSV to update the operator with a "replaces" value
In the new CSV, change the spec for one of the deployments. Leave the name the same, but change the spec.selector.matchLabels and spec.template.metadata.labels fields
Upload the bundle for your new CSV to quay
Install the "old" CSV on a kubernetes/openshift cluster through the OLM
Add an operatorsource (and whatever other associated objects you might need, depending on whether you're using kube/openshift) that references your new CSV
Use "oc get clusterserviceversion" or the analagous kubectl command to watch the update process. You should see it enter a state like this, PHASE for 0.1.2 will remain "Replacing" forever, PHASE for 0.1.3 will cycle through "Pending", "Failed", and "InstallReady"

$ oc get clusterserviceversion
NAME                            DISPLAY           VERSION   REPLACES             PHASE
packageserver.v0.9.0    Package Server    0.9.0                                      Succeeded
myoperator.v0.1.2         My Operator         0.1.2                                      Replacing
myoperator.v0.1.3         My Operator         0.1.3       myoperator.v0.1.2   Pending

look in the olm-operator pod in the olm namespace for errors, search for "field is immutable"
if you want to free it up, delete the deployment(s) from the existing install and the replace operation will succeed

The text was updated successfully, but these errors were encountered:

To let the upgrades go nicely and make good use of conditions reporting, we need to make sure that k8s don't change things behind our back. To do that, we force the update strategy to "Recreate". Relevant OLM issues: operator-framework/operator-lifecycle-manager#1028 operator-framework/operator-lifecycle-manager#952 Signed-off-by: Francesco Romani <fromani@redhat.com>

stale · 2020-02-26T18:51:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-04-27T10:39:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ecordell · 2020-06-05T17:38:44Z

A couple of things to mention here:

In newer versions of OLM, deployments are handled similarly to how pods are managed via replicasets - we stamp a hash of the spec in the CSV on the deployment and compare that. If it needs updating, we issue an Update, not a patch.
During upgrades, if you need to change an otherwise immutable field of a deployment (like the label selector), you can signal this to olm by changing the "name" of the deployment in the CSV spec. This will cause OLM to create an entirely new deployment and delete the old.

Please re-open or open a new issue if we need further discussion / clarification.

Summary: Noticed that operators were actually struggling to update to the latest operator version... After doing some digging, it turns out it was unhappy because the latest operator version has new labels and deployment labels are immutable. luckily, we can still move forward and update from this case: operator-framework/operator-lifecycle-manager#952 OLM assumes the deployment name is the same, so it tries to do a rolling deploy. however, changing the deployment name indicates to OLM that we want to completely replace the deployment. suggestions for a better new deployment name are appreciated. also fixed a bug with the release build where the rc's prev versions were incorrect. Test Plan: created test plan in checklist for verifying operator updates, and ran through the test plan: https://www.notion.so/pixielabs/Operator-Release-Checklist-a705283f190c4c0aa127f9439bb34180 Reviewers: vihang, zasgar Reviewed By: vihang Differential Revision: https://phab.corp.pixielabs.ai/D9533 GitOrigin-RevId: 561964e

On v3.6.3 the selectors have changed. Selectors are, according to OLM, an immutable field therefore we need to rename the deployment. By renaming the deployment OLM will delete the previous deployment and create a new one: operator-framework/operator-lifecycle-manager#952 (comment)

See: - operator-framework/operator-lifecycle-manager#1608 - operator-framework/operator-lifecycle-manager#952 (comment)

Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

* Rename operator deployment Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> * Update makefile Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> * Rename to tempo-operator-controller, update changelog Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> --------- Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

tmckayus mentioned this issue Jul 12, 2019

Updating Seldon Community Operator from 2.8 to 3.1 operator-framework/community-operators#513

Merged

rthallisey mentioned this issue Aug 21, 2019

Change deployment names to work around immutable fields kubevirt/hyperconverged-cluster-operator#238

Merged

djzager mentioned this issue Sep 5, 2019

Rolling updates nullify use of readiness probes #1028

Closed

ffromani mentioned this issue Sep 5, 2019

manifests: force "Recreate" update strategy kubevirt/kubevirt-ssp-operator#90

Merged

djzager mentioned this issue Sep 20, 2019

chore(dep): remove version from deploy immutables kubevirt/machine-remediation#82

Merged

sablumiah mentioned this issue Sep 26, 2019

upgrade externals-dns to v0.5.17 ministryofjustice/cloud-platform-infrastructure#419

Merged

stale bot added the wontfix label Feb 26, 2020

openshift-ci-robot added triage/unresolved Indicates an issue that can not or will not be resolved. and removed wontfix labels Feb 27, 2020

stale bot added the wontfix label Apr 27, 2020

openshift-ci-robot removed the wontfix label May 2, 2020

ecordell closed this as completed Jun 5, 2020

cdjohnson mentioned this issue Oct 5, 2020

InstallPlan cannot resolve a change to immutable field in operator Deployment spec #1608

Closed

This was referenced Dec 13, 2021

Change deployment name workaround immutable fields 3scale/3scale-operator#699

Merged

change operator deployment name 3scale/apicast-operator#168

Merged

ricardomaraschini mentioned this issue Feb 16, 2022

csv: rename deployment (PROJQUAY-3239) quay/quay-bridge-operator#56

Merged

maskarb mentioned this issue Jun 28, 2022

add pod affinity to prevent multi-attach error project-koku/koku-metrics-operator#141

Merged

jsenko added a commit to jsenko/apicurio-registry-operator that referenced this issue Sep 29, 2022

Workaround fix for an OLM issue

721b4ae

See: - operator-framework/operator-lifecycle-manager#1608 - operator-framework/operator-lifecycle-manager#952 (comment)

jsenko mentioned this issue Sep 29, 2022

Workaround fix for an OLM issue Apicurio/apicurio-registry-operator#191

Merged

jsenko added a commit to Apicurio/apicurio-registry-operator that referenced this issue Sep 29, 2022

Workaround fix for an OLM issue

269128e

See: - operator-framework/operator-lifecycle-manager#1608 - operator-framework/operator-lifecycle-manager#952 (comment)

andreasgerstmayr mentioned this issue May 31, 2023

Rename operator deployment grafana/tempo-operator#432

Merged

astefanutti mentioned this issue Jun 1, 2023

Use uniquely identifying labels selector for the operator deployment project-codeflare/codeflare-operator#119

Merged

astefanutti mentioned this issue Jun 8, 2023

The ODH operator deployment should use non-overlapping label selector opendatahub-io/opendatahub-operator#230

Closed

fontivan mentioned this issue Aug 23, 2023

OCPBUGS-17037: Fix upgrade failure due to immutable label selectors openshift-kni/cluster-group-upgrades-operator#649

Merged

astefanutti mentioned this issue Nov 28, 2023

Update selector label for pod and service opendatahub-io/opendatahub-operator#751

Closed

3 tasks

NissesSenap mentioned this issue Feb 6, 2024

[Bug] grafana operator 5.6.0 -> 5.6.1 upgrade issues openshift grafana/grafana-operator#1399

Closed

astefanutti mentioned this issue May 28, 2024

Partial revert of #1695: Restore control-plane selector kubernetes-sigs/kueue#2246

Merged

ebaron mentioned this issue Sep 11, 2024

[Bug] Upgrade to 3.0.1 fails cryostatio/cryostat-operator#942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

tmckayus commented Jul 12, 2019 •

edited

Loading

stale bot commented Feb 26, 2020

stale bot commented Apr 27, 2020

ecordell commented Jun 5, 2020

Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

Comments

tmckayus commented Jul 12, 2019 • edited Loading

stale bot commented Feb 26, 2020

stale bot commented Apr 27, 2020

ecordell commented Jun 5, 2020

tmckayus commented Jul 12, 2019 •

edited

Loading