-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(GH-226) Use a dynamic pool_check loop period #227
Conversation
Interested in your opinion on this @mattkirby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 to making this more configurable. However, I don't think requiring a user to configure these values is ideal. In the case of 90 pools and a small number of connections the defaults cause a number of operations to time out, presumably waiting for the chance to get a connection from the pool.
I think it may make sense to stop pool operations from happening on a per-pool basis. Otherwise the behavior is kind of odd and things time out at random points in their pool checking operations, which makes the application behave unpredictably, and causes pools to take a very long time to refill. These settings allow the situation to be coerced into waiting long enough that a connection is eventually freed. Given these constraints though I think it is sensible to only check pools when a connection is available to perform those operations. Alternately, pool checking should not use any provider connections at all and all of that work should be pushed outside of the pool manager.
When I tested re-working pool_manager to pair the concept of threads and connections I found things worked pretty well, so that could be an easy solution to consider as an iteration.
As a side note, should the defaults be a more "sane" then e.g. Default polling period is (num of pools) / 4 so 20 pools = 5 sec polling period. |
Yeah, I think it's definitely an improvement. |
So my thinking was restricting the number of available connections doesn't stop how often a provider is polling, it just caps the maximum rate. So it'll reach that hard limit fairly quickly when the number of connections is far lower than pools (as we've seen). So I figured the one metric that has the greatest effect on provider load (at least in the vSphere instance) is how often vSphere is queried for the list of VMs in a Pool. Another option would be to split operations that do and do not require an inventory scan. This means we can limit the most "expensive" operation (inventory) to when we actually need it e.g.
In fact that ^^ may be a better way of doing it. As my rolling timeout is still a little naive. |
@glennsarti as it is I think this would be useful. In starting to work with this code with some of our vmpooler instances I find that I frequently hit the 60 second timeout. Is there anything I can do to help this one land? I'm all for further iteration, but I think this is useful as it stands now. |
@mattkirby Sorry, been busy fixing other bits of infra. I think I should change this to use more sane defaults. |
@glennsarti since you're already going to be changing the defaults, it seems better to make them into constants as well since you use them enough.. |
ae3ea3b
to
cd742dc
Compare
Okay, I've changed the defaults to 5 -> 60 and decay of 2.0. Also used constants and updated docs. Added some rubocop minor fixes for good measure. |
Ready for merge. |
... and tests have failed... |
552d4c3
to
7226c86
Compare
And tests are now good. Ready for review agan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about a typo, but generally 👍
spec/unit/pool_manager_spec.rb
Outdated
@@ -2200,6 +2384,12 @@ | |||
subject._check_pool(pool_object,provider) | |||
end | |||
|
|||
it 'should return the number of discoverd of VMs' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: typo in "discovered"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Previously the check_pool would always check the pool every 5 seconds, however with a large number of pools, this can cause resource issues inside the providers. This commit: - Introduces a dynamic check_pool period which increases during stability and decreases when the pool is being change in an important way - Surfaces the settings as global config defaults but can also be set on a per pool basis - Adds defaults to emulate the current behaviour - Unit tests for the new behaviour
Fix minor rubocop violations
7226c86
to
5e0aefc
Compare
Previously the check_pool would always check the pool every 5 seconds, however
with a large number of pools, this can cause resource issues inside the
providers. This commit:
decreases when the pool is being change in an important way
pool basis