(GH-226) Use a dynamic pool_check loop period #227

glennsarti · 2017-06-23T04:27:00Z

Previously the check_pool would always check the pool every 5 seconds, however
with a large number of pools, this can cause resource issues inside the
providers. This commit:

Introduces a dynamic check_pool period which increases during stability and
decreases when the pool is being change in an important way
Surfaces the settings as global config defaults but can also be set on a per
pool basis
Adds defaults to have default of 5 to 60 seconds and decay of 2.0
Unit tests for the new behaviour

glennsarti · 2017-06-23T23:23:40Z

Interested in your opinion on this @mattkirby

mattkirby

I'm +1 to making this more configurable. However, I don't think requiring a user to configure these values is ideal. In the case of 90 pools and a small number of connections the defaults cause a number of operations to time out, presumably waiting for the chance to get a connection from the pool.

I think it may make sense to stop pool operations from happening on a per-pool basis. Otherwise the behavior is kind of odd and things time out at random points in their pool checking operations, which makes the application behave unpredictably, and causes pools to take a very long time to refill. These settings allow the situation to be coerced into waiting long enough that a connection is eventually freed. Given these constraints though I think it is sensible to only check pools when a connection is available to perform those operations. Alternately, pool checking should not use any provider connections at all and all of that work should be pushed outside of the pool manager.

When I tested re-working pool_manager to pair the concept of threads and connections I found things worked pretty well, so that could be an easy solution to consider as an iteration.

glennsarti · 2017-06-26T23:00:06Z

As a side note, should the defaults be a more "sane" then e.g. Default polling period is (num of pools) / 4 so 20 pools = 5 sec polling period.

mattkirby · 2017-06-26T23:02:54Z

Yeah, I think it's definitely an improvement.

glennsarti · 2017-06-26T23:08:30Z

So my thinking was restricting the number of available connections doesn't stop how often a provider is polling, it just caps the maximum rate. So it'll reach that hard limit fairly quickly when the number of connections is far lower than pools (as we've seen). So I figured the one metric that has the greatest effect on provider load (at least in the vSphere instance) is how often vSphere is queried for the list of VMs in a Pool.

Another option would be to split operations that do and do not require an inventory scan. This means we can limit the most "expensive" operation (inventory) to when we actually need it e.g.
Inventory is needed:

On the first pool scan for an initial inventory
Just after a clone_vm (new vm will appear in the inventory)
Just after a destroy_vm

In fact that ^^ may be a better way of doing it. As my rolling timeout is still a little naive.

mattkirby · 2017-07-07T15:52:41Z

@glennsarti as it is I think this would be useful. In starting to work with this code with some of our vmpooler instances I find that I frequently hit the 60 second timeout. Is there anything I can do to help this one land? I'm all for further iteration, but I think this is useful as it stands now.

glennsarti · 2017-07-07T15:56:18Z

@mattkirby Sorry, been busy fixing other bits of infra. I think I should change this to use more sane defaults.

kevpl · 2017-07-07T16:49:29Z

@glennsarti since you're already going to be changing the defaults, it seems better to make them into constants as well since you use them enough..

glennsarti · 2017-07-08T00:06:17Z

Okay, I've changed the defaults to 5 -> 60 and decay of 2.0. Also used constants and updated docs. Added some rubocop minor fixes for good measure.

glennsarti · 2017-07-08T00:08:03Z

Ready for merge.

glennsarti · 2017-07-08T00:25:31Z

... and tests have failed...

glennsarti · 2017-07-10T19:36:37Z

And tests are now good. Ready for review agan

shrug

One comment about a typo, but generally 👍

shrug · 2017-07-12T22:55:43Z

spec/unit/pool_manager_spec.rb

@@ -2200,6 +2384,12 @@
        subject._check_pool(pool_object,provider)
      end

+      it 'should return the number of discoverd of VMs' do


nitpick: typo in "discovered"

Previously the check_pool would always check the pool every 5 seconds, however with a large number of pools, this can cause resource issues inside the providers. This commit: - Introduces a dynamic check_pool period which increases during stability and decreases when the pool is being change in an important way - Surfaces the settings as global config defaults but can also be set on a per pool basis - Adds defaults to emulate the current behaviour - Unit tests for the new behaviour

Fix minor rubocop violations

glennsarti changed the title ~~(GH-226) Use a dynamic pool_check loop period~~ {WIP}(GH-226) Use a dynamic pool_check loop period Jun 23, 2017

mattkirby approved these changes Jun 26, 2017

View reviewed changes

shrug requested review from underscorgan and shrug June 27, 2017 22:14

glennsarti force-pushed the add-check-skew branch from ae3ea3b to cd742dc Compare July 8, 2017 00:02

glennsarti changed the title ~~{WIP}(GH-226) Use a dynamic pool_check loop period~~ (GH-226) Use a dynamic pool_check loop period Jul 8, 2017

glennsarti force-pushed the add-check-skew branch from 552d4c3 to 7226c86 Compare July 9, 2017 03:32

mattkirby approved these changes Jul 12, 2017

View reviewed changes

shrug approved these changes Jul 12, 2017

View reviewed changes

glennsarti added 2 commits July 12, 2017 17:13

(maint) Fix minor rubocop violations

5e0aefc

Fix minor rubocop violations

glennsarti force-pushed the add-check-skew branch from 7226c86 to 5e0aefc Compare July 13, 2017 00:13

shrug merged commit 9b0e55f into puppetlabs:master Jul 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(GH-226) Use a dynamic pool_check loop period #227

(GH-226) Use a dynamic pool_check loop period #227

glennsarti commented Jun 23, 2017 •

edited

Loading

glennsarti commented Jun 23, 2017

mattkirby left a comment

glennsarti commented Jun 26, 2017

mattkirby commented Jun 26, 2017

glennsarti commented Jun 26, 2017

mattkirby commented Jul 7, 2017

glennsarti commented Jul 7, 2017

kevpl commented Jul 7, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 10, 2017

shrug left a comment

shrug Jul 12, 2017

glennsarti Jul 13, 2017

(GH-226) Use a dynamic pool_check loop period #227

(GH-226) Use a dynamic pool_check loop period #227

Conversation

glennsarti commented Jun 23, 2017 • edited Loading

glennsarti commented Jun 23, 2017

mattkirby left a comment

Choose a reason for hiding this comment

glennsarti commented Jun 26, 2017

mattkirby commented Jun 26, 2017

glennsarti commented Jun 26, 2017

mattkirby commented Jul 7, 2017

glennsarti commented Jul 7, 2017

kevpl commented Jul 7, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 8, 2017

glennsarti commented Jul 10, 2017

shrug left a comment

Choose a reason for hiding this comment

shrug Jul 12, 2017

Choose a reason for hiding this comment

glennsarti Jul 13, 2017

Choose a reason for hiding this comment

glennsarti commented Jun 23, 2017 •

edited

Loading