Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only async trigger changes #8778

Closed
wants to merge 10 commits into from
Closed

Conversation

hanshasselberg
Copy link
Member

@hanshasselberg hanshasselberg commented Sep 30, 2020

This is a tiny step towards fixing #6616.

This PR stops all agent endpoints from syncing the changes synchronously and leaves it to the asynchronous ticker to pick up these changes. And makes changes to how syncNodeInfo is called...

The evolution of syncNodeInfo:

original:

  1. updateSyncState -- among other things -- determines if node needs syncing and set nodeInfoInSync
  2. SyncChanges adds node info to every service/check update if !nodeInfoInSync and has a manual syncNodeInfo call in case everything before failed to updated nodeinfo

after #7189:

  1. updateSyncState determines if node needs syncing and set nodeInfoInSync
  2. SyncChanges calls syncNodeInfo upfront instead of appending nodeinfo to every service/check update

proposed:

  1. updateSyncState determines if node needs syncing and calls syncNodeInfo right away
  2. SyncChanges doesn't try to sync node info

This PR potentially makes the tests more flaky because services/checks are no longer synced immediately.

Previously we had a variable that would track whether the node's
information was in sync with the servers.

This was a shared variable in the local state which required locking to
read/write.

Rather than guarding access to this variable we now sync node updates
when we see that the node info is out of sync. This only happens in one
location: updateSyncState().

Additionally, rather than SkippingNodeUpdate based on the shared
variable we now let the servers determine whether the node update should
be skipped. This is checked in structs.RegisterRequest.ChangesNode().
@@ -212,6 +212,16 @@ func TestAPI_HealthChecks(t *testing.T) {
t.Fatalf("err: %v", err)
}

retry.Run(t, func(r *retry.R) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for service to sync to catalog. testrpc cannot be imported here.

@@ -64,6 +64,7 @@ func WaitUntilNoLeader(t *testing.T, rpc rpcFn, dc string, options ...waitOption
type waitOption struct {
Token string
WaitForAntiEntropySync bool
WaitForService string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ability to wait for a service because services are no longer synced immediately.

@@ -250,7 +250,6 @@ func NewTestServerConfigT(t TestingTB, cb ServerConfigCallback) (*TestServer, er
return nil, errors.Wrap(err, "failed marshaling json")
}

t.Logf("CONFIG JSON: %s", string(b))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debug.

// Should lose a peer
retry.Run(t, func(r *retry.R) {
timer := &retry.Timer{Timeout: 10 * time.Second, Wait: 500 * time.Millisecond}
retry.RunWith(timer, t, func(r *retry.R) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully fixing a flaky test with this.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I am not very familiar with state syncing, but I think this makes sense. Just one question about the change in error return

Comment on lines 1355 to 1359
return nil

default:
l.logger.Warn("Syncing node info failed.", "error", err)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the consequences of this function no longer returning an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comment I revisited this part and reverted to returning and handling any errors from syncNodeInfo. I also added a longer explanation for these changes to the PR description.

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@jmurret
Copy link
Member

jmurret commented May 9, 2023

Closing as this is over two years old.

@jmurret jmurret closed this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants