-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952
Comments
To let the upgrades go nicely and make good use of conditions reporting, we need to make sure that k8s don't change things behind our back. To do that, we force the update strategy to "Recreate". Relevant OLM issues: operator-framework/operator-lifecycle-manager#1028 operator-framework/operator-lifecycle-manager#952 Signed-off-by: Francesco Romani <fromani@redhat.com>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
A couple of things to mention here:
Please re-open or open a new issue if we need further discussion / clarification. |
Summary: Noticed that operators were actually struggling to update to the latest operator version... After doing some digging, it turns out it was unhappy because the latest operator version has new labels and deployment labels are immutable. luckily, we can still move forward and update from this case: operator-framework/operator-lifecycle-manager#952 OLM assumes the deployment name is the same, so it tries to do a rolling deploy. however, changing the deployment name indicates to OLM that we want to completely replace the deployment. suggestions for a better new deployment name are appreciated. also fixed a bug with the release build where the rc's prev versions were incorrect. Test Plan: created test plan in checklist for verifying operator updates, and ran through the test plan: https://www.notion.so/pixielabs/Operator-Release-Checklist-a705283f190c4c0aa127f9439bb34180 Reviewers: vihang, zasgar Reviewed By: vihang Differential Revision: https://phab.corp.pixielabs.ai/D9533 GitOrigin-RevId: 561964e
On v3.6.3 the selectors have changed. Selectors are, according to OLM, an immutable field therefore we need to rename the deployment. By renaming the deployment OLM will delete the previous deployment and create a new one: operator-framework/operator-lifecycle-manager#952 (comment)
On v3.6.3 the selectors have changed. Selectors are, according to OLM, an immutable field therefore we need to rename the deployment. By renaming the deployment OLM will delete the previous deployment and create a new one: operator-framework/operator-lifecycle-manager#952 (comment)
Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
* Rename operator deployment Commit 7fa448a added additional labels to the operator deployment and its selector field. Unfortunately the selector field is immutable, and when OLM tries to patch the deployment while upgrading, Kubernetes will reject this update. A workaround is to rename the deployment, as suggested here: operator-framework/operator-lifecycle-manager#952 Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> * Update makefile Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> * Rename to tempo-operator-controller, update changelog Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> --------- Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Background: changes to selectors/labels on deployments.apps (and other objects) are not allowed by kubernetes, for the discussion behind that decision see kubernetes/kubernetes#50808.
During a "replace" operation on a CSV, the OLM will attempt to update/patch an existing deployment if the new CSV uses the same deployment name as the existing CSV. If the new CSV includes changes to the deployment(s) that are not allowed by kubernetes (ie a change to a selector/label field, for example), the replace operation will fail ("field is immutable" from kube). At this point, the OLM will periodically retry the replace, failing each time -- the old CSV will never successfully be replaced (but it will stay functional afaik)
If possible, this should be handled in a way that is more clear to an end user. Currently, it may not be clear to someone who isn't an advanced kubernetes/OLM user exactly what is happening and how to correct it.
From a user perspective, there are a couple of simple workarounds to this scenario:
simply respin the CSV and use a different deployment name (long term solution, covers everybody going forward). This is more of a developer solution, during the crafting of a new pull request to update the CSV.
delete the deployment(s) from the old CSV by hand. This will allow the current replace operation to continue. This is an immediate fix for the instance, more of an end user hot-fix.
High-level steps to reproduce
The text was updated successfully, but these errors were encountered: