Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call NGINX Plus API in a series of parallel requests #360

Merged
merged 8 commits into from
Sep 2, 2024

Conversation

oliveromahony
Copy link
Contributor

Proposed changes

Send Parallel requests to NGINX Plus API in GetStats call. This approach uses golang.org/x/sync/errgroup

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@oliveromahony oliveromahony requested a review from a team as a code owner August 28, 2024 16:03
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Aug 28, 2024
@lucacome
Copy link
Member

What's the reason for this? How much faster is it? Is it worth the added complexity?

client/nginx.go Outdated
if err != nil {
return nil, fmt.Errorf("failed to get stats: %w", err)
}
var g errgroup.Group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to use WithContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure but the context produced can't be passed to each function call for the requests until they get exposed. So the code change would look like streamGroup, _ := errgroup.WithContext(ctx) - to change the individual calls I would have to re-write the client to add context everywhere. That seems like a bigger piece of work outside the initial intention of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the linter was complaining so I updated the relevant calls to take a context. There may be others, but I think they are out of scope of this PR.

@oliveromahony
Copy link
Contributor Author

What's the reason for this? How much faster is it? Is it worth the added complexity?

On certain size configurations we saw the GetStats did not complete in the NGINX Agent. These were very large configurations. Running the 20 requests to the plus API in series means that if each of them takes 100ms (as an example) x 20 the Get calls, could take 2 seconds. By shifting the first 14 to run in parallel, then call GetAvailableStreamEndpoints wait for that, then call in parallel streamEndpoints (at most 4) and then finally the connections. So under the same conditions highlighted above (assuming worst case calls are max 100ms), the changed GetStats function will run 100ms x 4 = 400ms in comparison.

This is the reason for the change in this PR.

client/nginx.go Outdated Show resolved Hide resolved
Comment on lines +1446 to +1448
mu.Lock()
stats.StreamServerZones = *streamServerZones
mu.Unlock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this lock be around the client call like the rest of the collections above? Or do all of these calls need to defer mu.Unlock() so the lock is in place for the entire goroutine?

Same comment for the rest of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good spot. Actually, the way it was written originally was slightly different. The locks are to prevent concurrent writes to the stats struct. I think parallel requests to the API should be ok. Generally it's probably safe enough as different fields under the struct are being populated by different calls. But the race detector detected concurrent access so I have updated the code to prevent concurrent writes to the stats structure and made the code consistent. Let me know if that explanation and the code updates satisfy your concerns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe not based on the failures in the pipeline :)

@oliveromahony
Copy link
Contributor Author

What's the reason for this? How much faster is it? Is it worth the added complexity?

On certain size configurations we saw the GetStats did not complete in the NGINX Agent. These were very large configurations. Running the 20 requests to the plus API in series means that if each of them takes 100ms (as an example) x 20 the Get calls, could take 2 seconds. By shifting the first 14 to run in parallel, then call GetAvailableStreamEndpoints wait for that, then call in parallel streamEndpoints (at most 4) and then finally the connections. So under the same conditions highlighted above (assuming worst case calls are max 100ms), the changed GetStats function will run 100ms x 4 = 400ms in comparison.

This is the reason for the change in this PR.

FYI: connections is at the end because when GetStats is called, it adds numbers to the active connections. When all these are finished, if you call the connections endpoint, you only see 1 connection generated by this client

client/nginx.go Outdated
}

func (client *NginxClient) getWithContext(ctx context.Context, path string, data interface{}) error {
ctx, cancel := context.WithTimeout(ctx, client.ctxTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if creating a new context here is the best thing to do. It might be better to allow the user of the library to create the context themselves. That would allow them to have different timeout values for different requests if they wanted to. Think it would add more flexibility to the end user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking, if the user doesn't set the ctxTimeout, then it defaults to 0? I think this was a quick workaround for functions without a context being passed. Now that this PR updates that, then the condition you outlined is possible. I'm happy to remove the context.WithTimeout here and deprecate the client.ctxTimeout in this PR. Alternatively, I could add this to the non context based calls and leave the xWithContext functions use the context passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjberman @lucacome thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dhurley. This function doesn't need to wrap the context with a new one, it can just use the context that's passed in, and the caller can decide what that context is.

Per your last comment, get() could use ctxTimeout and create the context to pass to this function.

Related: #205

Copy link
Contributor Author

@oliveromahony oliveromahony Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, was waiting for alignment. Just pushed the update to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both get and getWithContext (as an example)? Is it just to keep backward compatibility? If so I would be okay with a breaking changes release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just to keep backward compatibility. There are still some references calling the get, delete, patch and post that have not been refactored. I'm trying to limit changes in this PR, we could do those in a separate PR and bump the major version of the client?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@oliveromahony oliveromahony merged commit 1f9091b into nginx:main Sep 2, 2024
13 checks passed
@oliveromahony oliveromahony deleted the async-get-stats branch September 2, 2024 09:48
@lucacome lucacome added the enhancement Pull requests for new features/feature enhancements label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement Pull requests for new features/feature enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants