-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add flag-protected informer write-back #3745
Conversation
err = woc.controller.wfInformer.GetStore().Update(un) | ||
if err != nil { | ||
woc.log.Errorf("error writing back workflow to informer: %s", err) | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the write back fails, simply sleep as we normally would
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea.
@@ -529,7 +530,22 @@ func (woc *wfOperationCtx) persistUpdates() { | |||
// conflicts arose when attempting to update, causing our conflict resolution code to be invoked. The conflict | |||
// resolution code was not perfect and workflow information was not correctly persisted, causing undefined behavior. | |||
// This line was reintroduced in v2.9.5, and should remain as long as we use our current informer pattern. | |||
time.Sleep(1 * time.Second) | |||
if os.Getenv("INFORMER_WRITE_BACK") == "true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having this off by default is a good idea. It has cons - we have an untested path of code, which mean it'll rot at some point.
If we can create a test that we can be confident if that we have fixed this, I'd suggest we flip this guard.
I wonder if we're approaching this incorrectly. Rather than finding a solution, should we be finding a reliable way to reproduce these kinds of issues? A chaos suite for example? |
Closing as we have #4025 |
Closes #3719
I have tested some Workflows known to cause the "conflict" error using the strategy outlined in #3684 (comment) and have so for not seen any issues when using the informer write-back strategy.
My current idea is that we merge this as an opt-in feature-flag protected change. Ask users to test, then consider making it a permanent change if there are no issues.
Things I'd like to consider before moving forward with this: