Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve vmpooler scheduling logic #167

Merged
merged 11 commits into from
Nov 22, 2016

Conversation

mattkirby
Copy link
Contributor

This change implements improved scheduling of pooled VMs by updating the logic used to find the least used host in a target cluster. Without this change host selection is based on VM count, which does not give an indication of VM size or current host utilization.

Additionally, this change implements a queue for migrating VMs at request time. The least used compatible host is selected for migration, which occurs transparently to the consumer, and migration is performed if required.

Without these changes vmpooler may cause uneven load distribution across a cluster of resources since the consumption of VMs is separated from scheduling.

@mattkirby
Copy link
Contributor Author

This change is ready for review, but is blocked from moving forward until I present the changes next week. Please feel free to offer any review in the interim.

@mattkirby mattkirby force-pushed the migrate_at_checkout branch 2 times, most recently from 3135ca2 to c36fd3a Compare October 19, 2016 23:59
@@ -228,6 +228,7 @@
allow(redis).to receive(:smembers).with('vmpooler__running__pool1').and_return([])
allow(redis).to receive(:smembers).with('vmpooler__completed__pool1').and_return([])
allow(redis).to receive(:smembers).with('vmpooler__discovered__pool1').and_return([])
allow(redis).to receive(:smembers).with('vmpooler__migrating__pool1').and_return([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool manager has the only remaining hive of "mockist" specs in the pooler (I converted the remaining specs over to non-mockist specs). The "mockist" specs don't really exercise much of anything, but I stopped short here because rewriting these specs is a pain.

It would not be a blocker for this PR to not put in some real specs for the new functionality, but I would definitely give big ✨ / 🎉 to the effort.

If you end up wanting to go that route, there are some helpers in https://github.com/puppetlabs/vmpooler/blob/master/spec/helpers.rb that are designed to set up actual vm state in tests, which allows us to then call our methods to manipulate and then assert that things look right in redis.

For example, here's a fairly stateful non-mockist spec: https://github.com/puppetlabs/vmpooler/blob/master/spec/vmpooler/api/v1/vm_template_spec.rb#L173-L184

Anyway, don't worry about it if it's a pain (it probably is). It's on our (my?) wishlist at some point to have our suite be more real, and to evolve the pool manager so that it has a reasonable domain model reflected in the code and those test helpers (or something similar) migrate down into the code itself.

Copy link
Contributor Author

@mattkirby mattkirby Oct 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to setup some real tests for the redis usage as I think it will uncover some bugs. I would like to move this change forward separately from that, but I am definitely happy to put in some time to do this since I think it will help identify some of the problem areas of the current redis usage and increase confidence in future changes. The examples are very helpful for me since I have not written much in the way of ruby tests. I'm happy to learn though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started adding some specs here and making an effort to test things a little more thoroughly. I have not written rspec tests before, so I would like to get some feedback about how I'm approaching this to identify if I'm on the right path, or if I need to adjust the approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through some months ago and converted most of our specs away from mocking/stubbing on redis, etc. I stopped short on the pool_manager specs because there's just a lot of mess in that library and it needs deeper work (I think) to clean things up.

I'm fine here with continuing in the existing style (because the cost of not doing so is large), even if the tests end up less than ideal: brittle, not fully exercising functionality.


def _migrate_vm(vm, pool)
$redis.srem('vmpooler__migrating__' + pool, vm)
vm_object = $vsphere[pool].find_vm(vm) || $vsphere[pool].find_vm_heavy(vm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again (see below), not a priority to fix here, but if it's easy, this incantation could become a method with a name (e.g., find_vsphere_pool_vm(pool, vm)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this as suggested.

if host == parent_host
$logger.log('s', '[ ] [' + pool + "] No migration required for '" + vm + "'")
else
start = Time.now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good place to put a $stats timing block and emit migration time stats to graphite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been added. It looks like it follows the currently used patterns, but please let me know if I've missed something or if it needs further adjustment.

if inventory[vm]
begin
migrate_vm(vm, pool['name'])
rescue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be handy for logging purposes here to make this:

begin
  migrate_vm(vm, pool['name'])
rescue => err
  $logger.log('s', '[x] [' + pool['name'] + "] '" + vm + "' failed to migrate: #{err}")
end

The log line could also be written as:

$logger.log('s', "[x] [#{pool['name']}] '#{vm}' failed to migrate: #{err}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the context of the failure to the log message and cleaned up the logger string.

end
return nil
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential refactoring of this method (untested):

  def get_host_utilization(host, model=nil, limit=90)
    return nil unless model
    return nil unless host_has_cpu_model? host, model
    return nil if host.runtime.inMaintenanceMode
    return nil unless host.overallStatus == 'green'

    cpu_utilization = cpu_utilization_for host
    memory_utilization = memory_utilization_for host

    return nil if cpu_utilization > limit
    return nil if memory_utilization > limit

    [ cpu_utilization + memory_utilization, host ]
  end

  def host_has_cpu_model?(host, model)
    host.hardware.cpuPkg[0].description.split().include? model
  end

  def cpu_utilization_for(host)
    cpu_usage = host.summary.quickStats.overallCpuUsage
    cpu_size = host.summary.hardware.cpuMhz * host.summary.hardware.numCpuCores
    (cpu_usage.to_f / cpu_size.to_f) * 100
  end

  def memory_utilization_for(host)
    memory_usage = host.summary.quickStats.overallMemoryUsage
    memory_size = host.summary.hardware.memorySize / 1024 / 1024
    (memory_usage.to_f / memory_size.to_f) * 100
  end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran through and put together something that looks nearly like this and still does the same thing. I think it captures what you were suggesting and helps clean it up. Thank you.


hosts[hosts_sort.sort_by { |_k, v| v }[0][0]]
def find_least_used_compatible_host(vm)
begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is a bit cryptic -- I think we're basically calling a thing for the side effect of throwing an error if uninitialized and then initializing in that case. Probably could extract it to a method and give it an intention-revealing name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a existing pattern that I duplicated here that tries to see if a connection is alive and available for reuse. I moved it to a method and replaced usage of this pattern with a call of the method.

end

source_host = vm.summary.runtime.host
model = source_host.hardware.cpuPkg[0].description.split()[4]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit: host.hardware.cpuPkg[0].description.split() turns up in multiple places. What is this thing? Can we use the name for the meaning of it, wrap it in a method, and call it? The [4] bit is a bit cryptic. If we extract to a method we could document there what the structure is. Maybe even make a method with a clear name that does the [4] part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made an attempt to clarify this with methods. I think it should be more clear now, but am happy to increase clarity if it still requires improvement.

host_usage = get_host_utilization(host, model)
target_hosts << host_usage if host_usage
end
return target_hosts.sort[0][1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: we don't need return at the end of a method in ruby, it's enough just to say target_hosts.sort[0][1].

Also, I notice that we're doing similar things here as we are higher up in the diff. Makes me think there's a more fundamental operation happening that we could extract to a method and share?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I wasn't sure what the best practice is and have adjusted this.

@rick
Copy link
Contributor

rick commented Oct 20, 2016

Mostly style / refactoring / readability / nitpick comments. Nothing that I mentioned would block this PR. Am totally game to pair on this if you want to at some point.

@@ -196,31 +196,62 @@ def find_folder(foldername)
base
end

def get_host_utilization(host, model=nil, limit=90)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could use some doc strings to

  • identify what model and limit mean.
  • indicate what kind of return values to expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to try and provide some context for what these means. Let me know if you think further comments are necessary, or would be helpful.

end
end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's not necessarily part of the goal of the current work you are doing, I think this would be a good place (and time) to add hypervisor data to the vmpooler log regardless of whether a migration happened...see https://tickets.puppetlabs.com/browse/POOLER-29. Not a blocker for this PR, but worth considering.

As for the "graphite stats collection" part of that ticket, that probably isn't necessary unless you can think of a good way to represent individual vm <-> hypervisor mapping in graphite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What hypervisor data are you hoping to see? The only thing that stands out to me is if you wanted to know the host your VM is running on we could report that when logging that no migration is required. This will already be reported when a migration is performed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkirby VM<->host mapping is the hypervisor data I would like to see...and reporting it when logging that no migration is required sounds sufficient to meet that need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynr, I have added this change.

@@ -455,6 +455,31 @@ def _check_snapshot_queue
end
end

def migrate_vm(vm, pool)
Thread.new do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this creating a new thread for each VM that needs to be migrated? What's the likelihood that this might happen simultaneously for a large number of VMs at once leading to post-migration overutilization on a host that was previously underutilized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a new thread for each VM that needs to be migrated. The limitation ends up being placed by the number of vsphere connections, and there are shared vsphere connections per pool that end up being used (I'm not sure how many). There's a concept of limiting concurrent tasks in vmpooler, but I haven't gotten into the nuts and bolts of how that works yet. I suspect there is some room for optimization. It looks like the model of vmpooler is to always create more threads, so this is consistent with how the application is handling clone operations, and other actions of that nature. The number of operations able to be performed at any one time will always be limited by the number of vsphere connections.

Since the question of which host is the best target for my VM right now is asked for every VM, and the answer is not shared between migration jobs, then it should be able to accurately track the utilization of hosts and deploy new jobs to lesser used hosts minimizing the possibility of hot spots. I would expect a more even distribution of CPU ready time across the cluster to be representative of this.

@mattkirby mattkirby force-pushed the migrate_at_checkout branch 3 times, most recently from c75a0f1 to 23055c1 Compare October 25, 2016 21:14
@mattkirby
Copy link
Contributor Author

I have added an additional commit that makes VM migration at checkout optional, defaulting to disabled. When migration_limit is in the config section of vmpooler.yaml, and the value is greater than 0, migration at checkout will happen. In either case this change adds logging to vmpooler.yaml of a VM parent host, whether involving a migration or not.

@shrug
Copy link

shrug commented Oct 25, 2016

I'm 👍 on getting this in and then tuning. I have not done a deep review, but if @rick has looked it over that's good enough for me

@@ -225,6 +225,13 @@
# If set, prefixes all created VMs with this string. This should include
# a separator.
# (optional; default: '')
#
# - migration_limit
# When set to any value greateer than 0 enable VM migration at checkout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "greateer"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed it.

@rick
Copy link
Contributor

rick commented Oct 25, 2016

One typo in a comment, otherwise: LGTM. ✨ 🚀 🎸 🤘

@waynr
Copy link
Contributor

waynr commented Oct 25, 2016

This looks pretty good to me. @mattkirby is there a particular dashboard we should watch once this change rolls out to production?

@mattkirby mattkirby force-pushed the migrate_at_checkout branch from 23055c1 to e9d8eab Compare October 25, 2016 23:08
@rick
Copy link
Contributor

rick commented Oct 26, 2016

👍 from me to move this forward to testing on vmpooler-dev. Could also coordinate with @puppetlabs/cinext about trying it on the cinext vmpooler. If it's looking good there, then prod vmpooler (branch deploy during time of low activity) and eventually merge.

@mattkirby mattkirby force-pushed the migrate_at_checkout branch from e9d8eab to ff46bfc Compare October 28, 2016 19:18
end
end
datacenter.hostFolder.children.each do |cluster_object|
return cluster_object if cluster_object.name == cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could shorten this to:

datacenter.hostFolder.childen.find { |cluster_object| cluster_object.name == cluster }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have shortened this as suggested and updated the PR.

@mattkirby mattkirby force-pushed the migrate_at_checkout branch from ff46bfc to de8560c Compare October 28, 2016 21:05
@rick
Copy link
Contributor

rick commented Nov 16, 2016

A few comments inline, but nothing really blocking. If this has been tested to behave as desired, I'm 👍

@mattkirby
Copy link
Contributor Author

mattkirby commented Nov 16, 2016

I updated pool_manager_spec for the method changes implemented. All tests pass now.

This is because a part of the changes I made involve many methods receiving the vsphere connection as an argument, rather than assuming it knows where to find it.

@rick
Copy link
Contributor

rick commented Nov 16, 2016

Tests passing! 🍏 💚 📗 🍵 🇬🇱 ♻️

@rick
Copy link
Contributor

rick commented Nov 16, 2016

I'm 👍


$logger.log('d', "[!] [#{pool}] '#{vm}' marked as 'failed' after #{timeout} minutes")
def fail_pending_vm(vm, pool, timeout, exists=true)
begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny nit -- ruby lets us leave off a begin...end block for the purposes of enabling rescue in the case of a method definition (i.e., a method definition acts like a begin...end block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's handy. I have updated the fail_pending_vm method accordingly. Let me know if it doesn't look right.

@rick
Copy link
Contributor

rick commented Nov 17, 2016

👍 on the lost VM changes.

@rick
Copy link
Contributor

rick commented Nov 17, 2016

👍 on rescue-related changes.

user: vsphere['username'],
password: vsphere['password'],
insecure: true
def initialize(credentials)
Copy link
Contributor Author

@mattkirby mattkirby Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit to this change is that connections are no longer made to vsphere when a thread is initialized at application start time. Instead, they are opened when work requiring a vsphere connection is first performed for the thread. The result is the application and its threads are available faster, and we do not open as many connections as pools when the application starts.

This model still doesn't do anything to prevent an unnecessarily large amount of connections from existing to vsphere, but the logic change does reduce the immediate stress we would otherwise see from opening all connections at once.

yield sock if block_given?
ensure
sock.close
return true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll have fewer changes if this method either yields the socket or raises an exception (so the caller doesn't need to check the return value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have updated it and it seems more clear this way.

fail_pending_vm(vm, pool, timeout) if ! socket
move_pending_vm_to_ready(vm, pool, host)
rescue
fail_pending_vm(vm, pool, timeout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example this could just be:

  open_socket vm
  move_pending_vm_to_ready(vm, pool, host)
rescue
  fail_pending_vm(vm, pool, timeout)

if $redis.smove('vmpooler__ready__' + pool, 'vmpooler__completed__' + pool, vm)
$logger.log('d', "[!] [#{pool}] '#{vm}' is unreachable, removed from 'ready' queue")
else
$logger.log('d', "[!] [#{pool}] '#{vm}' is unreachable, and failed to remove from 'ready' queue")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly this could be:

  open_socket
rescue
  if $redis.smove('vmpooler__ready__' + pool, 'vmpooler__completed__' + pool, vm)
    ...

@mattkirby
Copy link
Contributor Author

I think this latest commit should address the suggestions in the last few comments @joshcooper.

@mattkirby
Copy link
Contributor Author

mattkirby commented Nov 21, 2016

I would like to propose merging this on 11/22 at 11am PST if there are no further issues to be raised here. Please let me know if this seems unreasonable and if additional changes are required before merging this.

I plan to rebase the changes before merging, but will hold off for tomorrow morning in order to provide reasonable time for feedback.

  • Test latest on vmpooler-dev
  • Test latest on vmooler-cinext
  • Rebase commits to clean up commit history in preparation for merge.
  • Wait until 11am PST to provide the opportunity for any further objections to be raised

@rick
Copy link
Contributor

rick commented Nov 21, 2016

Just to double-check: have you been able to deploy this successfully to vmpooler-dev and observe the migration?

@mattkirby
Copy link
Contributor Author

mattkirby commented Nov 21, 2016

@rick, yes. The pre-tests incarnation of these changes are running on vmpooler-dev, vmpooler-cinext, and vmpooler-provisioner-dev-2 and have been running for the past several weeks. The latest commit here is running on vmpooler-provisioner-dev-2. I am happy to deploy the latest commit here to dev and vmpooler-cinext before deploying to production if that is helpful as well. My plan was otherwise to update each of those to be running against the merged commit prior to updating production vmpooler.

Update: When checking in on vmpooler-cinext it looks like there is a bug causing the queue tracking migrations to have stale entries, and eventually fill up, which results in VMs not being migrated because it believes too many migrations are in progress. I believe that the change since what's running here improves upon this situation and will fix this, but I would like to run the updated version to validate this. I'm going to update which version this is running against to latest and validate the issue does not persist.

Update #2: I have confirmed that the latest commit running on vmpooler-cinext resolves the issue with VMs getting stuck in the migration queue thanks to improved error handling. If a VM is deleted before migration happens then it will fail. Previously, that failure would result in the VM being stuck in the queue. Now it logs an error indicating the migration failed because the object could not be found and removes the VM from the migration queue.

@rick
Copy link
Contributor

rick commented Nov 21, 2016

Sounds great to me 👍

…ce credentials from the configuration object that itself loads the configuration file when the application starts. Without this change the configuration file is reloaded every time vspherehelper is called. Additionally, this change makes it more straightforward to test vspherehelper connections.

A method is added to make more clear what's happening when checking if a socket can be opened to a pending VM on port 22. Additionally, the connection appends domain from the configuration, when present, to the VM name so DNS search is not required.
Remove unneeded begin block in method

Fix formatting of rescue block in fail_pending_vm
…o test. Simplify migrate_vm method by breaking out some componenents. Improve error handling around migrate_vm.

Add helpers to support setting up redis for migration at checkout
testing.
Use mock_redis instead of redis.

Make passing of mock redis to helper calls more clear

Update pool_manager_spec to specify vsphere argument where applicable. Update pool_helper calls to vsphere where needed for tests to pass. Without this change rspec tests for pool_manager_spec exhibit 12 failures.

Update pool_manager_spec test with open_socket

Pool_manager_spec stubs a tcpsocket connection to simulate this happening directly within _check_pending_vm. This commit updates this to look more like its usage with open_socket, which allows the test to pass.
…ity.

Update migrate_vm to make clear when an if block is the end of the line by returning. Use scard instead of smembers.size() for determining migrations in progress.
This commit updates pool manager to use a method for opening a socket
instead of opening it directly from check_pending_vm. Support is added
for specifying the domain of the VM to connect to, which lays the
groundwork for doing away with the assumption of having DNS search
domains set for vmpooler to move VMs to the ready state.

Additionally, this commit adds a block to ensure open_socket closes open connections. Without this change sockets are opened to each VM before moving to the ready state, and never explicitly closed.

Also, use open socket for check_ready_vm
This commit updates vmpooler to understand how to resolve a situation
where a pending VM does not exist. Without this change a pending VM that
does not exist in vmware inventory gets stuck in the pending state,
preventing the pool from ever reaching its target capacity.

As a part of this change the find_vm method is updated to perform a
light, then heavy search each time find_vm is called and all usage of
find_vm || find_vm_heavy is replaced. This makes find_vm usage
consistent across pool_manager.

Additionally, open_socket method is updated to resolve an incorrect
reference to the host name.
@mattkirby
Copy link
Contributor Author

mattkirby commented Nov 22, 2016

I have squashed commits to clean up the history in preparation for merge.

@mattkirby mattkirby merged commit 705e5d2 into puppetlabs:master Nov 22, 2016
@mattkirby mattkirby deleted the migrate_at_checkout branch November 22, 2016 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants