-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving startup time #1315
Comments
Something like a systemd unit reading the user-data over network and putting the info in the right place ? |
yup |
I'll give it a try. |
With userdata, it takes approx 1 sec from VM creation to agent proxy being ready.
Without userdata (existing mechanism), it takes approx 7 sec from VM creation to agent proxy being ready.
|
@mkulke the above data is only for one iteration. I didn't try with different instance types. |
Some numbers that might help to identify further opportunities for optimization: VM launched (+0s):
/stat being created on podvm (+32s):
waagent starts on podvm (+39s):
VM reported as ready (+46s):
Sandbox is created (+47s):
So atm we get into container business after ~45s, the first signs of life from the VM are at ~30s. I suspect that if we report the VM as ok earlier (replacing waagent and setting goal-state manually) we might get closer to the 30s mark. |
There is some improvement, but we can possibly do more, hence re-opening |
At least on Azure there is a problem that the Create-VM API call will only return once the VM has registered itself as ready. We might want to reconsider this approach. Today an Azure SNP CVM will be provisioned in about 30s until entering the kernel. This is something that we have to account for unless we look at optimizations beyond the PodVM Image. An average boot process to So for kata-agent to start working we will need to wait 45s really. A possible optimization in CAA could be to not wait for the Create-VM call to return, i.e. we want to only know whether the VM creation has been successfully triggered, we're not interested necessarily in the "Readyness" of the VM. We will know this from a healthy POD or otherwise. So we could start facilitating connections even if the VM is not ready. We need to measure things however before implementing. I suspect that user-data is populated asynchronous, and we'll have to wait for user-data to be available anyway which exceeds 45s anyway. |
From measurements it looks like we optimized what we could so far #1674 removed some 10s |
This is a boot chart from
systemd-analyze
on ubuntu 22.04 in azure. It shows that we spend ~15s on initialization tasks, before the images are being pulled.A relatively low hanging fruit would probably be to remove the dependency on cloud-init. We provision files via the cloud-config protocol to the podvm's at the moment. We could replace that with some bespoke logic in the unit files directly and agent-protocol-forwarder would be able to start earlier.
The text was updated successfully, but these errors were encountered: