Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add flag-protected informer write-back #3745

Closed
wants to merge 1 commit into from

Conversation

simster7
Copy link
Member

@simster7 simster7 commented Aug 12, 2020

Closes #3719

I have tested some Workflows known to cause the "conflict" error using the strategy outlined in #3684 (comment) and have so for not seen any issues when using the informer write-back strategy.

My current idea is that we merge this as an opt-in feature-flag protected change. Ask users to test, then consider making it a permanent change if there are no issues.

Things I'd like to consider before moving forward with this:

  • Find and understand the rationale for why writing back to the informer is an anti pattern (talk to @jessesuen)
  • Write a "stress test" to try to find any weak points with this strategy
  • How much benefit we get from dropping the "sleep for 1s" strategy

err = woc.controller.wfInformer.GetStore().Update(un)
if err != nil {
woc.log.Errorf("error writing back workflow to informer: %s", err)
time.Sleep(1 * time.Second)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the write back fails, simply sleep as we normally would

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea.

@@ -529,7 +530,22 @@ func (woc *wfOperationCtx) persistUpdates() {
// conflicts arose when attempting to update, causing our conflict resolution code to be invoked. The conflict
// resolution code was not perfect and workflow information was not correctly persisted, causing undefined behavior.
// This line was reintroduced in v2.9.5, and should remain as long as we use our current informer pattern.
time.Sleep(1 * time.Second)
if os.Getenv("INFORMER_WRITE_BACK") == "true" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having this off by default is a good idea. It has cons - we have an untested path of code, which mean it'll rot at some point.

If we can create a test that we can be confident if that we have fixed this, I'd suggest we flip this guard.

@alexec
Copy link
Contributor

alexec commented Aug 12, 2020

I wonder if we're approaching this incorrectly. Rather than finding a solution, should we be finding a reliable way to reproduce these kinds of issues? A chaos suite for example?

@alexec
Copy link
Contributor

alexec commented Sep 14, 2020

Closing as we have #4025

@alexec alexec closed this Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate writing back to the informer
2 participants