-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call NGINX Plus API in a series of parallel requests #360
Conversation
What's the reason for this? How much faster is it? Is it worth the added complexity? |
client/nginx.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("failed to get stats: %w", err) | ||
} | ||
var g errgroup.Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need to use WithContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure but the context produced can't be passed to each function call for the requests until they get exposed. So the code change would look like streamGroup, _ := errgroup.WithContext(ctx)
- to change the individual calls I would have to re-write the client to add context everywhere. That seems like a bigger piece of work outside the initial intention of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the linter was complaining so I updated the relevant calls to take a context. There may be others, but I think they are out of scope of this PR.
On certain size configurations we saw the GetStats did not complete in the NGINX Agent. These were very large configurations. Running the 20 requests to the plus API in series means that if each of them takes 100ms (as an example) x 20 the Get calls, could take 2 seconds. By shifting the first 14 to run in parallel, then call GetAvailableStreamEndpoints wait for that, then call in parallel streamEndpoints (at most 4) and then finally the connections. So under the same conditions highlighted above (assuming worst case calls are max 100ms), the changed GetStats function will run 100ms x 4 = 400ms in comparison. This is the reason for the change in this PR. |
mu.Lock() | ||
stats.StreamServerZones = *streamServerZones | ||
mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this lock be around the client call like the rest of the collections above? Or do all of these calls need to defer mu.Unlock()
so the lock is in place for the entire goroutine?
Same comment for the rest of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good spot. Actually, the way it was written originally was slightly different. The locks are to prevent concurrent writes to the stats struct. I think parallel requests to the API should be ok. Generally it's probably safe enough as different fields under the struct are being populated by different calls. But the race detector detected concurrent access so I have updated the code to prevent concurrent writes to the stats structure and made the code consistent. Let me know if that explanation and the code updates satisfy your concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe not based on the failures in the pipeline :)
FYI: connections is at the end because when GetStats is called, it adds numbers to the active connections. When all these are finished, if you call the connections endpoint, you only see 1 connection generated by this client |
client/nginx.go
Outdated
} | ||
|
||
func (client *NginxClient) getWithContext(ctx context.Context, path string, data interface{}) error { | ||
ctx, cancel := context.WithTimeout(ctx, client.ctxTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if creating a new context here is the best thing to do. It might be better to allow the user of the library to create the context themselves. That would allow them to have different timeout values for different requests if they wanted to. Think it would add more flexibility to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking, if the user doesn't set the ctxTimeout, then it defaults to 0? I think this was a quick workaround for functions without a context being passed. Now that this PR updates that, then the condition you outlined is possible. I'm happy to remove the context.WithTimeout here and deprecate the client.ctxTimeout in this PR. Alternatively, I could add this to the non context based calls and leave the xWithContext functions use the context passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, was waiting for alignment. Just pushed the update to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both get
and getWithContext
(as an example)? Is it just to keep backward compatibility? If so I would be okay with a breaking changes release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just to keep backward compatibility. There are still some references calling the get, delete, patch and post that have not been refactored. I'm trying to limit changes in this PR, we could do those in a separate PR and bump the major version of the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Proposed changes
Send Parallel requests to NGINX Plus API in GetStats call. This approach uses golang.org/x/sync/errgroup
Checklist
Before creating a PR, run through this checklist and mark each as complete.