Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(GH-226) Use a dynamic pool_check loop period #227

Merged
merged 2 commits into from
Jul 13, 2017

Conversation

glennsarti
Copy link
Contributor

@glennsarti glennsarti commented Jun 23, 2017

Previously the check_pool would always check the pool every 5 seconds, however
with a large number of pools, this can cause resource issues inside the
providers. This commit:

  • Introduces a dynamic check_pool period which increases during stability and
    decreases when the pool is being change in an important way
  • Surfaces the settings as global config defaults but can also be set on a per
    pool basis
  • Adds defaults to have default of 5 to 60 seconds and decay of 2.0
  • Unit tests for the new behaviour

@glennsarti glennsarti changed the title (GH-226) Use a dynamic pool_check loop period {WIP}(GH-226) Use a dynamic pool_check loop period Jun 23, 2017
@glennsarti
Copy link
Contributor Author

Interested in your opinion on this @mattkirby

Copy link
Contributor

@mattkirby mattkirby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 to making this more configurable. However, I don't think requiring a user to configure these values is ideal. In the case of 90 pools and a small number of connections the defaults cause a number of operations to time out, presumably waiting for the chance to get a connection from the pool.

I think it may make sense to stop pool operations from happening on a per-pool basis. Otherwise the behavior is kind of odd and things time out at random points in their pool checking operations, which makes the application behave unpredictably, and causes pools to take a very long time to refill. These settings allow the situation to be coerced into waiting long enough that a connection is eventually freed. Given these constraints though I think it is sensible to only check pools when a connection is available to perform those operations. Alternately, pool checking should not use any provider connections at all and all of that work should be pushed outside of the pool manager.

When I tested re-working pool_manager to pair the concept of threads and connections I found things worked pretty well, so that could be an easy solution to consider as an iteration.

@glennsarti
Copy link
Contributor Author

As a side note, should the defaults be a more "sane" then e.g. Default polling period is (num of pools) / 4 so 20 pools = 5 sec polling period.

@mattkirby
Copy link
Contributor

Yeah, I think it's definitely an improvement.

@glennsarti
Copy link
Contributor Author

So my thinking was restricting the number of available connections doesn't stop how often a provider is polling, it just caps the maximum rate. So it'll reach that hard limit fairly quickly when the number of connections is far lower than pools (as we've seen). So I figured the one metric that has the greatest effect on provider load (at least in the vSphere instance) is how often vSphere is queried for the list of VMs in a Pool.

Another option would be to split operations that do and do not require an inventory scan. This means we can limit the most "expensive" operation (inventory) to when we actually need it e.g.
Inventory is needed:

  • On the first pool scan for an initial inventory
  • Just after a clone_vm (new vm will appear in the inventory)
  • Just after a destroy_vm

In fact that ^^ may be a better way of doing it. As my rolling timeout is still a little naive.

@shrug shrug requested review from underscorgan and shrug June 27, 2017 22:14
@mattkirby
Copy link
Contributor

@glennsarti as it is I think this would be useful. In starting to work with this code with some of our vmpooler instances I find that I frequently hit the 60 second timeout. Is there anything I can do to help this one land? I'm all for further iteration, but I think this is useful as it stands now.

@glennsarti
Copy link
Contributor Author

@mattkirby Sorry, been busy fixing other bits of infra. I think I should change this to use more sane defaults.

@kevpl
Copy link
Contributor

kevpl commented Jul 7, 2017

@glennsarti since you're already going to be changing the defaults, it seems better to make them into constants as well since you use them enough..

@glennsarti
Copy link
Contributor Author

Okay, I've changed the defaults to 5 -> 60 and decay of 2.0. Also used constants and updated docs. Added some rubocop minor fixes for good measure.

@glennsarti glennsarti changed the title {WIP}(GH-226) Use a dynamic pool_check loop period (GH-226) Use a dynamic pool_check loop period Jul 8, 2017
@glennsarti
Copy link
Contributor Author

Ready for merge.

@glennsarti
Copy link
Contributor Author

... and tests have failed...

@glennsarti
Copy link
Contributor Author

And tests are now good. Ready for review agan

Copy link

@shrug shrug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about a typo, but generally 👍

@@ -2200,6 +2384,12 @@
subject._check_pool(pool_object,provider)
end

it 'should return the number of discoverd of VMs' do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: typo in "discovered"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Previously the check_pool would always check the pool every 5 seconds, however
with a large number of pools, this can cause resource issues inside the
providers.  This commit:
- Introduces a dynamic check_pool period which increases during stability and
  decreases when the pool is being change in an important way
- Surfaces the settings as global config defaults but can also be set on a per
  pool basis
- Adds defaults to emulate the current behaviour
- Unit tests for the new behaviour
Fix minor rubocop violations
@shrug shrug merged commit 9b0e55f into puppetlabs:master Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants