-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove stale Ready=False conditions value to show more accurate status #884
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and gets HC closer to how KC behaves. I have added a new helper function to runtime that can be used here, see fluxcd/pkg#730
94a724d
to
2958334
Compare
Updated to use the helper from fluxcd/pkg#730. Need to update go.mod with a new pkg/runtime release. This change is in itself an improvement to avoid certain confusing status. #885 can be added separately later for more enhancement in the drift detection-correction loop case specifically. |
2958334
to
b919354
Compare
When the reconciliation begins, while fulfilling the prerequisites, Ready=False condition for various reasons are added on the object. On failure, this reason is persisted on the object. On a subsequent reconciliation, when the failure is recovered, the Ready=False condition is not updates until the atomic reconciliation reaches a conclusion. During this period if the atomic reconciliation enters a retry loop due to constant drift detection and correction, the stale Ready=False condition with incorrect reason persists on the object. The Ready=False message is also copied to Reconciling=True condition, resulting in an incorrect depiction of what's actually happening. For example, if previously the HelmRelease failed with dependency not ready error, on a subsequent reconciliation, even after going past the dependency check and returning from atomic reconciliation due to drift detection and correction loop scenario, the Ready=False condition continues to show the stale dependency not ready error. In order to show more accurate status, the Ready=False conditions added while fulfilling prerequisites can be removed once those checks have succeeded, updating Ready=False to Ready=Unknown with "reconciliation in progress" message. If the atomic reconciliation gets stuck in the drift detection and correction loop with this, the Ready and Reconciling conditons would show "reconciliation in progress". This should be a better indicator of what's going on. The events and logs can be checked to determine accurately what's causing the reconciliation to be progressing for ever. Signed-off-by: Sunny <github@darkowlzz.space>
b919354
to
59c577a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @darkowlzz for this major improvement to UX 🥇
Related to the issue described in fluxcd/flux2#4524 and observations from #855. And another instance of the issue discussed on slack that revealed that stale Ready=False value results in such scenarios (described in detail below).
When the reconciliation begins, while fulfilling the prerequisites, Ready=False condition for various reasons are added on the object. On failure, this reason is persisted on the object. On a subsequent reconciliation, when the failure is recovered, the Ready=False condition is not updates until the atomic release reconciliation reaches a conclusion. During this period if the atomic reconciliation enters a retry loop due to constant drift detection and correction, the stale Ready=False condition with incorrect reason persists on the object. The Ready=False message is also copied to Reconciling=True condition, refer
helm-controller/internal/reconcile/atomic_release.go
Lines 222 to 225 in c2c1064
For example, if previously the HelmRelease failed with dependency not ready error, on a subsequent reconciliation, even after going past the dependency check and returning from atomic release reconciliation due to drift detection and correction loop scenario, the Ready=False condition continues to show the stale dependency not ready error.
An example of the status conditions during this situation as shared by a user is:
In order to show more accurate status, the Ready=False conditions added while fulfilling prerequisites can be removed once those checks have succeeded, updating Ready=False to Ready=Unknown with "reconciliation in progress" message. If the atomic reconciliation gets stuck in the drift detection and correction loop with this, the Ready and Reconciling conditons would show "reconciliation in progress". This should be a better indicator of what's going on. The events and logs can be checked to determine accurately what's causing the reconciliation to be progressing for ever.
An example of the new conditions showing progress after recovering from failure and entering drift detection-correction loop:
(NOT IMPLEMENTED IN THIS PR - See #885 ) Another way to improve the scenario would be to update the Ready condition when drift correction runs. But this would be very specific to the drift scenario and may not work for other situations that could also result in a similar scenario. Updating the Ready condition anywhere after entering the atomic release reconciliation should change the Ready condition to report accurate state.