-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-1.5] fix irrecoverable errors in async operations #2679
[release-1.5] fix irrecoverable errors in async operations #2679
Conversation
Reconciliations involving asynchronous operations can fall into a loop where an eventual "Failed" result can block future reconciliations from making any further changes to that particular resource to fix the problem. This change sets `longRunningOperationStates` for agent pools on the corresponding AzureManagedMachinePool instead of the AzureManagedControlPlane, since changes to that resource were not being persisted. It also only blocks starting new operations on the status of existing operations of the same type. In-progress PUT operations will no longer block new DELETEs and in-progress DELETEs will not block in-progress PUTs. In cases where polling a future from the Azure API would eventually return both `isDone==true` and a non-nil error, a "failed checking if the operation was complete" message would be logged and the error would refer to the ultimate failure unrelated to polling the future itself. This change treats all `isDone==true` polling checks as successful and relies on the operation's error to be captured in the future's `Result`. The future will always be deleted from the status when the operation is done so it can be retried the next reconciliation if it failed.
@k8s-infra-cherrypick-robot: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is an automated cherry-pick of #2665
/assign CecileRobertMichon