-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controllers keeps creating extra controllees if initialization is slow #48893
Comments
I'm hesistant to jump into this without a discussion of the tradeoffs. The clients should wait in many cases, and if they aren't waiting that's a bug. |
Observing uninitialized entries is a hammer, I want to make sure this is a nail first. Can I get an example of the problem? |
Also, are these controllers tolerant of watch delays? If not it's a problem with the controllers (in some cases) |
GenerateName is definitely part of the problem - controllers that rely heavily on it to create aren't tolerant to any sort of mismatch between actual and observed. If we end up watching uninitialized there we also need to verify they are not outracing their caches |
That's why i opened an issue first ;) As i mentioned in the initial comment, rc and rs are protected by the expectation system. I missed daemonset and job controller in my earlier grep, they also use the expectation system, so they should be fine. Cronjob controller only creates Job instead of creating pods directly, so it should be ok. That said, i haven't checked if these controllers perform correctly if the "create expectation" aren't met timely. IIRC statuefulset controller always creates pod one by one, so it should be ok. Deployment controller names the rs with the hash of the podtemplate, so it won't succeed in creating a duplicate rs, though i don't know if it will lead to other problems. @janetkuo do you know? |
@kelseyhightower with which controller did you see the problem? |
related #22061 |
Will queue this up to go over. @soltysh had work queued for cronjob and
job to address the need to resync frequently, so he can verify in those.
…On Sun, Jul 16, 2017 at 7:27 PM, Chao Xu ***@***.***> wrote:
related #22061 <#22061>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48893 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5YBUBu7c-DDsBAfPmECeQpDtW57ks5sOpxMgaJpZM4OXYEd>
.
|
@caesarxuchao I'm seeing this with deployments. |
Also seeing this with Deplyment/RS. The controller keeps creating pods because it doesn't see the uninitialized one, I ended up with 20 pods for a 1-replica Deployment. Also deleting the Deployment (via kubectl delete) does not delete the Pods, they are just leaked, probably for the same reason. |
Garbage collector controller doesn't list/watch uninitialized objects. If the owner is deleted before the dependents are initialized, the dependents will be deleted by the gc after they are initialized, no matter what PropagationPolicy the owner is deleted with. This not a big problem for "Orphan", users don't see uninitialized dependents, so they wouldn't expect them orphaned. If the policy is "Foreground GC", the garbage collector will delete the owner before deleting the uninitialized dependents. It could be a problem if user expects to use "foreground GC" to release all the resources (name, quota) before deletes the owner. |
In the case reported by@ahmetb, no intializer controller was present, so the pods were never initialized. @ahmetb reported one pod was created every 30 second. I'll check if it's a problem with the "expectation" framework. |
The |
It's because the creations timeout and were considered failed, so the expectation was dropped, https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L470-L475
|
It's definitely not created, but there's no way to know that it failed.
Given that initializers are likely to be a thing, and that long creations
are possible, it doesn't seem unreasonable to expect our core controllers
to at least tolerate the condition of timeout on create and not, for
instance, go crazy creating pods.
…On Mon, Jul 17, 2017 at 3:13 PM, Chao Xu ***@***.***> wrote:
It's because the creations timeout and were considered failed, so the
expectation was dropped, https://github.com/kubernetes/
kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L470-L475
err = rsc.podControl.CreatePodsWithControllerRef(rs.Namespace, &rs.Spec.Template, rs, controllerRef)
if err != nil {
// Decrement the expected number of creates because the informer won't observe this pod
glog.V(2).Infof("Failed creation, decrementing expectations for replica set %q/%q", rs.Namespace, rs.Name)
rsc.expectations.CreationObserved(rsKey)
errCh <- err
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48893 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pwb8xzME6zbT_4w4lV2Lbq2h-2iWks5sO7JvgaJpZM4OXYEd>
.
|
Agree that the controller should tolerate the timeout on create. In the current settings, if the pod is not initialized in the next 5 mins, the controller will forget the "create expectation" and go on creating another pod. In case of a failing initializer controller, it could take long time to get it back online, so still, many pods will be created. @smarterclayton is that acceptable? And what are the concerns if we let the controllers list/watch uninitialized objects? |
The biggest concern is that if we add this to all informers, then all clients of informers need to opt in explicitly to uninitialized resources (otherwise we will break them unexpectedly), vs opting out, which is a signature change to informers. |
Controllers should not retry immediately if CREATE fails due to initialization timeout. This part is not controversial. I'll send a PR. Regarding if we should update existing controllers list/watch uninitialized objects, I think
We'll need to do complex changes to sharedInformers to let it hide uninitialized controller but expose uninitialized controllees. Or we can configure the sharedInformers to expose all uninitialized objects to the controllers, and rely on individual controllers to stay away from uninitialized controller objects. Either way we'll need to change the signature of sharedInformers, as @smarterclayton pointed out. |
From a refactor perspective, a consumer of shared informer has to opt in.
We've talked about having a way to add filters more easily, so it's
possible that we want to add a utility that makes it even easier. I'll
take a quick look at that on monday.
…On Thu, Jul 27, 2017 at 10:35 PM, Chao Xu ***@***.***> wrote:
Controllers should not retry immediately if CREATE fails due to
initialization timeout. This part is not controversial. I'll send a PR.
Regarding if we should update existing controllers list/watch
uninitialized objects, I think
- We should hide uninitialized controller objects from controllers,
e.g., hiding uninitialized replicasets from the replicaset controller.
- We have to expose uninitialized controllees. Because
1. Otherwise controllers will create an excessive number of
uninitialized controllees if an initializer fails for a long time.
2. In case that controllees stuck in the uninitialized state, the
controller should update the Status to expose the fact, e.g., "5
pods uninitialized", rather than showing "0 ready pods". Otherwise it will
be hard for users to see what's going on.
We'll need to do complex changes to sharedInformers to let it hide
uninitialized controller but expose uninitialized controllees. Or we can
configure the sharedInformers to expose all uninitialized objects to the
controllers, and rely on individual controllers to stay away from
uninitialized controller objects. Either way we'll need to change the
signature of sharedInformers, as @smarterclayton
<https://github.com/smarterclayton> pointed out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48893 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p6wxQxh0hHG-CErsUR64rgMRBulzks5sSUjdgaJpZM4OXYEd>
.
|
@smarterclayton is there anything i can do for the the shared informer refactoring? Is there an on-going discussion or shall I start my own investigation? |
cc @erictune |
If a controller binary watches uninitialized objects, does it see more Watch traffic? |
Yes, it sees the initialization traffic. If the initializers are wriitten in an efficient way, then N initializers add N events. For a pod, N more events is negligible in its lifetime. |
…ize-timeout Automatic merge from submit-queue (batch tested with PRs 49855, 49915) Let controllers ignore initialization timeout when creating pods Partially address #48893 (comment). This only updates the controllers that create pods with `GenerateName`. The controllers ignore the timeout error when creating the pods, depending on how the initialization progress: * If the initialization is successful in less than 5 mins, the controller will observe the creation via the informer. All is good. * If the initialization fails, server will delete the pod, but the controller won't receive any event. The controller will not create new pod until the Creation expectation expires in 5 min. * If the initialization takes too long (> 5 mins), the Creation expectation expires and the controller will create extra pods. I'll send follow-up PRs to fix the latter two cases, e.g., by refactoring the sharedInformer.
I'm prototyping a refactor of the shared informer. |
Automatic merge from submit-queue Let the quota evaluator handle mutating specs of pod & pvc ### Background The final goal is to address #47837, which aims to allow more mutation for uninitialized objects. To do that, we [decided](#47837 (comment)) to let the admission controllers to handle mutation of uninitialized objects. ### Issue #50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc. ### TODO We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with #48893.
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836) Add a persistent volume label controller to the cloud-controller-manager Part of kubernetes/enhancements#88 Outstanding concerns needing input: - [x] Why 5 threads for controller processing? - [x] Remove direct linkage to aws/gce cloud providers [#51629] - [x] Modify shared informers to allow added event handlers ability to include uninitialized objects/using unshared informer #48893 - [x] Use cache.MetaNamespaceKeyFunc in event handler? I'm willing to work on addressing the removal of the direct linkage to aws/gce after this PR gets in.
[MILESTONENOTIFIER] Milestone Removed @caesarxuchao @kubernetes/sig-api-machinery-bugs Important: This issue was missing the |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
alpha initializers feature removed in 1.14 |
In 1.7 we introduced the concept of initialization. By default, clients won't see uninitialized objects.
Controllers (replicaset, deployment, job, etc.) have to observe the uninitialized controllees they create, otherwise the controllers continue to create more controllees (thanks to @kelseyhightower for reporting this).
One possible solution is converting all controllers to list/watch uninitialized controllees.
P.S. replicaset and replication controllers might be protected by their expectation systems, but other controllers are not.
cc @lavalamp @smarterclayton
The text was updated successfully, but these errors were encountered: