-
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic live-migration to balance load on cluster #485
Comments
What does "cycle" refer to in this context? |
A cycle would be a scheduled task inside Incus basically. The admin would basically instruct Incus to consider automatic balancing every 15min or every hour or every 3 hours, whatever makes sense based on their environment. |
Hello! My partners and I are currently studying Virtualization in UT Austin and would like to have more experience contributing to open-source repos like this - we are wondering if this issue is still open for solving? If so, we would love to take a chance and work on this because it seems very interesting to us! No worries if not! Thank you so much! |
Hello @stgraber! My team just had discussion about how to approach this ticket and we are wondering if you mind providing tips, suggestions, and/or clarifications on our approach? Here is our general logic:
Does this sound like we are on the right track? Thank you so much for your help! |
Hey @stgraber! We just wanted to follow up to see if you had any questions or feedback on our approach before we started implementing it. |
I don't think it really makes sense for this to be a scriptlet. Instead here, what we need really is:
I'd recommend you start by doing the paperwork stuff, so basically:
That last commit is going to be the big one as far as logic goes, but you can slowly grow it as you go, starting with just logging something to say the task would run, then grow that to include details about all servers and load, to their instances and eventually what moves would happen. There are a few things to be careful about:
|
Hello @stgraber! My group have been working on this issue and we have gotten to the section where we are considering what instances to migrate and explicitly migrating them. First, we check if an instance from dbCluster instances is migratable by using Another question we have is if you think our current way of calculating the effective score per server is good for the score balancing calculation we will do per instance? We are currently doing some division in calculating memory ((totalRAM - freeRAM) / totalRAM) used and CPU usage (numProc / numCores) and we fear it might be overcomplicating things a bit? Lastly, for our migration - we are planning to use a logic similar to We tried to implement this in the forked repo https://github.com/sophiezhangg/cs360v-incus/tree/cs360v-automigration Thank you so much for your help! |
You should be able to parse That's going to be fine for now and will avoid migrating away an instance which is just booting up but will soon consume a lot more CPU/memory.
Should be fine for the memory percentage. For CPU, I'd do load-average / total CPU count, but we don't currently have the load-average information exposed so that's going to be a bit difficult. I guess for now you can proceed with just looking at memory and I'll be opening another issue to track adding system load to the resources API so we can make use of that.
That should be fine. As we only do live-migration of VMs here, the logic should be a bit simpler than all the cases we have with evacuation.
General structure looks good. |
Thanks for the response! I just wanted to follow up on our first question, where we were wondering how we should best get the We are unsure of how to convert from an |
You should be able to do |
When running a cluster with one or more virtual-machines that's capable of being live-migrated across the cluster, we should be able to use that to better spread the load across a cluster by evaluating server load across the cluster and deciding whether to automatically move some workloads to re-balance things.
We already have a lot of the right pieces in place:
We will need to think through ways to avoid instances flip-flopping between servers as well as ways to mitigate the migration itself causing significant load difference on both the source and target server.
One approach would be to only perform a single migration per cycle, while also preventing an instance from being moved again for a number of cycles since its last move.
Ideally we'd be leveraging calls to our existing scheduler to find new locations for existing instances, only considering instances that can be easily live-migrated (no local storage, no local devices, ...).
The text was updated successfully, but these errors were encountered: