-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubespray play spends a lot of time doing nothing #9279
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
In a recent deployment it spent about 15 minutes on this, I believe it happens on each play of which there are several in the playbook so it adds up to ~ 45-60 minutes or so doing nothing. And that is for only running on half of the nodes at a time.
|
Very exciting, thanks for working on this @VannTen ! For the record I am noting some other ansible scalability issues that can cause slowness with large inventories or dynamically building inventories: |
The linked ones are gonna be harder to tackle 😆 |
Kubespray plays can take quite a long time, e.g. around 30-60 minutes even for a small cluster of 6 nodes, and many hours for large clusters. (Some timing measurements discussed in #8050)
A lot of the time goes by with ansible output like this:
not visibly achieving anything, and the ansible process(es) are usually mostly CPU bound, although doing strace shows a lot of repetitive IOPS operations as well, stat-ing files etc.
I believe it is because of https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubespray-defaults/meta/main.yml
The kubespray-defaults role is invoked no fewer than 12 times in the cluster.yml play, and each time it runs the downloads role as a dependency. The downloads role has several import_tasks, and it also does import_role of container-engine/crictl and container-engine/nerdctl (until d01b181 anyway). Every task in the downloads role has
when: not skip_downloads|default(false)
and the meta/main.yml hasskip_downloads: true
, but because of how import_ works with conditionals , this means that ansible processes all the imports , going through all the loops and tasks on all the nodes and basically executing them withwhen: false
. So it really is doing nothing, but wasting a lot of time on it.Update: the default variables from the kubespray-defaults role (and download role) are needed, but that takes practically no time. There needs to be a way to get the vars without wasting time on tasks , maybe either by switching to include_role instead of import_role, or otherwise maybe refactor the download default vars into the kubespray-defaults default vars.
The text was updated successfully, but these errors were encountered: