Controllers keeps creating extra controllees if initialization is slow #48893

caesarxuchao · 2017-07-13T18:32:20Z

In 1.7 we introduced the concept of initialization. By default, clients won't see uninitialized objects.

Controllers (replicaset, deployment, job, etc.) have to observe the uninitialized controllees they create, otherwise the controllers continue to create more controllees (thanks to @kelseyhightower for reporting this).

One possible solution is converting all controllers to list/watch uninitialized controllees.

P.S. replicaset and replication controllers might be protected by their expectation systems, but other controllers are not.

cc @lavalamp @smarterclayton

smarterclayton · 2017-07-14T00:02:28Z

I'm hesistant to jump into this without a discussion of the tradeoffs. The clients should wait in many cases, and if they aren't waiting that's a bug.

smarterclayton · 2017-07-14T00:04:09Z

Observing uninitialized entries is a hammer, I want to make sure this is a nail first.

Can I get an example of the problem?

smarterclayton · 2017-07-14T00:05:30Z

Also, are these controllers tolerant of watch delays? If not it's a problem with the controllers (in some cases)

smarterclayton · 2017-07-14T00:11:41Z

GenerateName is definitely part of the problem - controllers that rely heavily on it to create aren't tolerant to any sort of mismatch between actual and observed. If we end up watching uninitialized there we also need to verify they are not outracing their caches

caesarxuchao · 2017-07-14T00:49:19Z

Observing uninitialized entries is a hammer, I want to make sure this is a nail first.

That's why i opened an issue first ;)

As i mentioned in the initial comment, rc and rs are protected by the expectation system. I missed daemonset and job controller in my earlier grep, they also use the expectation system, so they should be fine. Cronjob controller only creates Job instead of creating pods directly, so it should be ok. That said, i haven't checked if these controllers perform correctly if the "create expectation" aren't met timely.

IIRC statuefulset controller always creates pod one by one, so it should be ok.

Deployment controller names the rs with the hash of the podtemplate, so it won't succeed in creating a duplicate rs, though i don't know if it will lead to other problems. @janetkuo do you know?

caesarxuchao · 2017-07-14T00:52:02Z

@kelseyhightower with which controller did you see the problem?

caesarxuchao · 2017-07-16T23:27:01Z

related #22061

smarterclayton · 2017-07-17T12:59:39Z

Will queue this up to go over. @soltysh had work queued for cronjob and job to address the need to resync frequently, so he can verify in those.

…

On Sun, Jul 16, 2017 at 7:27 PM, Chao Xu ***@***.***> wrote: related #22061 <#22061> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p5YBUBu7c-DDsBAfPmECeQpDtW57ks5sOpxMgaJpZM4OXYEd> .

kelseyhightower · 2017-07-17T16:40:07Z

@caesarxuchao I'm seeing this with deployments.

ahmetb · 2017-07-17T17:15:55Z

Also seeing this with Deplyment/RS. The controller keeps creating pods because it doesn't see the uninitialized one, I ended up with 20 pods for a 1-replica Deployment.

Also deleting the Deployment (via kubectl delete) does not delete the Pods, they are just leaked, probably for the same reason.

caesarxuchao · 2017-07-17T17:31:48Z

Garbage collector controller doesn't list/watch uninitialized objects. If the owner is deleted before the dependents are initialized, the dependents will be deleted by the gc after they are initialized, no matter what PropagationPolicy the owner is deleted with.

This not a big problem for "Orphan", users don't see uninitialized dependents, so they wouldn't expect them orphaned.

If the policy is "Foreground GC", the garbage collector will delete the owner before deleting the uninitialized dependents. It could be a problem if user expects to use "foreground GC" to release all the resources (name, quota) before deletes the owner.

caesarxuchao · 2017-07-17T18:07:43Z

In the case reported by@ahmetb, no intializer controller was present, so the pods were never initialized. @ahmetb reported one pod was created every 30 second. I'll check if it's a problem with the "expectation" framework.

caesarxuchao · 2017-07-17T18:50:51Z

The ExpectationsTimeout is 5 * time.Minute, so it doesn't explain why new pod is created every 30s. I need to dig deeper.

caesarxuchao · 2017-07-17T19:13:38Z

It's because the creations timeout and were considered failed, so the expectation was dropped, https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L470-L475

err = rsc.podControl.CreatePodsWithControllerRef(rs.Namespace, &rs.Spec.Template, rs, controllerRef)
if err != nil {
	// Decrement the expected number of creates because the informer won't observe this pod
	glog.V(2).Infof("Failed creation, decrementing expectations for replica set %q/%q", rs.Namespace, rs.Name)
	rsc.expectations.CreationObserved(rsKey)
	errCh <- err
}

smarterclayton · 2017-07-19T04:11:51Z

It's definitely not created, but there's no way to know that it failed. Given that initializers are likely to be a thing, and that long creations are possible, it doesn't seem unreasonable to expect our core controllers to at least tolerate the condition of timeout on create and not, for instance, go crazy creating pods.

…

On Mon, Jul 17, 2017 at 3:13 PM, Chao Xu ***@***.***> wrote: It's because the creations timeout and were considered failed, so the expectation was dropped, https://github.com/kubernetes/ kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L470-L475 err = rsc.podControl.CreatePodsWithControllerRef(rs.Namespace, &rs.Spec.Template, rs, controllerRef) if err != nil { // Decrement the expected number of creates because the informer won't observe this pod glog.V(2).Infof("Failed creation, decrementing expectations for replica set %q/%q", rs.Namespace, rs.Name) rsc.expectations.CreationObserved(rsKey) errCh <- err } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pwb8xzME6zbT_4w4lV2Lbq2h-2iWks5sO7JvgaJpZM4OXYEd> .

caesarxuchao · 2017-07-24T17:02:47Z

Agree that the controller should tolerate the timeout on create. In the current settings, if the pod is not initialized in the next 5 mins, the controller will forget the "create expectation" and go on creating another pod. In case of a failing initializer controller, it could take long time to get it back online, so still, many pods will be created. @smarterclayton is that acceptable? And what are the concerns if we let the controllers list/watch uninitialized objects?

smarterclayton · 2017-07-24T22:04:34Z

The biggest concern is that if we add this to all informers, then all clients of informers need to opt in explicitly to uninitialized resources (otherwise we will break them unexpectedly), vs opting out, which is a signature change to informers.

caesarxuchao · 2017-07-28T02:35:00Z

Controllers should not retry immediately if CREATE fails due to initialization timeout. This part is not controversial. I'll send a PR.

Regarding if we should update existing controllers list/watch uninitialized objects, I think

We should hide uninitialized controller objects from controllers, e.g., hiding uninitialized replicasets from the replicaset controller.
We have to expose uninitialized controllees. Because
1. Otherwise controllers will create an excessive number of uninitialized controllees if an initializer fails for a long time.
2. In case that controllees stuck in the uninitialized state, the controller should update the Status to expose the fact, e.g., "5 pods uninitialized", rather than showing "0 ready pods". Otherwise it will be hard for users to see what's going on.

We'll need to do complex changes to sharedInformers to let it hide uninitialized controller but expose uninitialized controllees. Or we can configure the sharedInformers to expose all uninitialized objects to the controllers, and rely on individual controllers to stay away from uninitialized controller objects. Either way we'll need to change the signature of sharedInformers, as @smarterclayton pointed out.

smarterclayton · 2017-07-28T17:53:52Z

From a refactor perspective, a consumer of shared informer has to opt in. We've talked about having a way to add filters more easily, so it's possible that we want to add a utility that makes it even easier. I'll take a quick look at that on monday.

…

On Thu, Jul 27, 2017 at 10:35 PM, Chao Xu ***@***.***> wrote: Controllers should not retry immediately if CREATE fails due to initialization timeout. This part is not controversial. I'll send a PR. Regarding if we should update existing controllers list/watch uninitialized objects, I think - We should hide uninitialized controller objects from controllers, e.g., hiding uninitialized replicasets from the replicaset controller. - We have to expose uninitialized controllees. Because 1. Otherwise controllers will create an excessive number of uninitialized controllees if an initializer fails for a long time. 2. In case that controllees stuck in the uninitialized state, the controller should update the Status to expose the fact, e.g., "5 pods uninitialized", rather than showing "0 ready pods". Otherwise it will be hard for users to see what's going on. We'll need to do complex changes to sharedInformers to let it hide uninitialized controller but expose uninitialized controllees. Or we can configure the sharedInformers to expose all uninitialized objects to the controllers, and rely on individual controllers to stay away from uninitialized controller objects. Either way we'll need to change the signature of sharedInformers, as @smarterclayton <https://github.com/smarterclayton> pointed out. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p6wxQxh0hHG-CErsUR64rgMRBulzks5sSUjdgaJpZM4OXYEd> .

caesarxuchao · 2017-07-28T17:57:54Z

@smarterclayton is there anything i can do for the the shared informer refactoring? Is there an on-going discussion or shall I start my own investigation?

caesarxuchao · 2017-08-02T00:59:11Z

cc @erictune

erictune · 2017-08-02T23:52:24Z

If a controller binary watches uninitialized objects, does it see more Watch traffic?
If a cluster has N initializers, does each initializer's update show up in the watch stream?
Or is there some batching?

caesarxuchao · 2017-08-03T17:04:22Z

If a cluster has N initializers, does each initializer's update show up in the watch stream?
Yes. No batching.

If a controller binary watches uninitialized objects, does it see more Watch traffic?

Yes, it sees the initialization traffic. If the initializers are wriitten in an efficient way, then N initializers add N events. For a pod, N more events is negligible in its lifetime.

…ize-timeout Automatic merge from submit-queue (batch tested with PRs 49855, 49915) Let controllers ignore initialization timeout when creating pods Partially address #48893 (comment). This only updates the controllers that create pods with `GenerateName`. The controllers ignore the timeout error when creating the pods, depending on how the initialization progress: * If the initialization is successful in less than 5 mins, the controller will observe the creation via the informer. All is good. * If the initialization fails, server will delete the pod, but the controller won't receive any event. The controller will not create new pod until the Creation expectation expires in 5 min. * If the initialization takes too long (> 5 mins), the Creation expectation expires and the controller will create extra pods. I'll send follow-up PRs to fix the latter two cases, e.g., by refactoring the sharedInformer.

caesarxuchao · 2017-08-11T21:56:43Z

I'm prototyping a refactor of the shared informer.

Automatic merge from submit-queue Let the quota evaluator handle mutating specs of pod & pvc ### Background The final goal is to address #47837, which aims to allow more mutation for uninitialized objects. To do that, we [decided](#47837 (comment)) to let the admission controllers to handle mutation of uninitialized objects. ### Issue #50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc. ### TODO We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with #48893.

Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836) Add a persistent volume label controller to the cloud-controller-manager Part of kubernetes/enhancements#88 Outstanding concerns needing input: - [x] Why 5 threads for controller processing? - [x] Remove direct linkage to aws/gce cloud providers [#51629] - [x] Modify shared informers to allow added event handlers ability to include uninitialized objects/using unshared informer #48893 - [x] Use cache.MetaNamespaceKeyFunc in event handler? I'm willing to work on addressing the removal of the direct linkage to aws/gce after this PR gets in.

k8s-github-robot · 2017-10-13T08:14:52Z

[MILESTONENOTIFIER] Milestone Removed

@caesarxuchao @kubernetes/sig-api-machinery-bugs

Important: This issue was missing the status/approved-for-milestone label for more than 7 days.

Help

fejta-bot · 2018-01-11T22:41:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-02-11T22:11:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-13T22:56:58Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

liggitt · 2022-04-15T14:35:34Z

alpha initializers feature removed in 1.14

caesarxuchao added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 13, 2017

caesarxuchao added this to the v1.8 milestone Jul 13, 2017

caesarxuchao self-assigned this Jul 13, 2017

caesarxuchao mentioned this issue Jul 17, 2017

Umbrella: dynamic admission control to beta #49038

Closed

29 tasks

caesarxuchao mentioned this issue Aug 1, 2017

Let controllers ignore initialization timeout when creating pods #49915

Merged

k8s-github-robot added the milestone-labels-incomplete label Sep 1, 2017

caesarxuchao modified the milestones: v1.9, v1.8 Sep 5, 2017

caesarxuchao added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 5, 2017

caesarxuchao changed the title ~~Controllers should list/watch uninitialized controllees~~ Controllers keeps creating extra controllees if initialization is slow Sep 5, 2017

caesarxuchao mentioned this issue Sep 5, 2017

[RFC] delete objects that take longer than 15s to initialize #51989

Closed

k8s-github-robot added the milestone/incomplete-labels label Oct 5, 2017

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 5, 2017

eparis removed the milestone-labels-incomplete label Oct 5, 2017

k8s-github-robot added milestone/needs-approval and removed milestone/incomplete-labels labels Oct 6, 2017

k8s-github-robot added milestone/removed and removed milestone/needs-approval labels Oct 13, 2017

k8s-github-robot removed this from the v1.9 milestone Oct 13, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2018

lukaszgryglicki mentioned this issue Mar 1, 2018

Pervasive lag issue with label/milestone changes in issues and PRs cncf/devstats.archive#78

Closed

k8s-ci-robot closed this as completed Mar 13, 2018

janetkuo mentioned this issue Dec 12, 2019

Do not swallow timeout in manageReplicas #86140

Merged

tedyu mentioned this issue Dec 17, 2019

Do not swallow timeout in manageReplicas #86225

Merged

liggitt removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controllers keeps creating extra controllees if initialization is slow #48893

Controllers keeps creating extra controllees if initialization is slow #48893

caesarxuchao commented Jul 13, 2017 •

edited

Loading

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

caesarxuchao commented Jul 14, 2017

caesarxuchao commented Jul 14, 2017

caesarxuchao commented Jul 16, 2017

smarterclayton commented Jul 17, 2017 via email

kelseyhightower commented Jul 17, 2017

ahmetb commented Jul 17, 2017 •

edited

Loading

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

smarterclayton commented Jul 19, 2017 via email

caesarxuchao commented Jul 24, 2017

smarterclayton commented Jul 24, 2017

caesarxuchao commented Jul 28, 2017

smarterclayton commented Jul 28, 2017 via email

caesarxuchao commented Jul 28, 2017 •

edited

Loading

caesarxuchao commented Aug 2, 2017

erictune commented Aug 2, 2017

caesarxuchao commented Aug 3, 2017

caesarxuchao commented Aug 11, 2017

k8s-github-robot commented Oct 13, 2017

fejta-bot commented Jan 11, 2018

fejta-bot commented Feb 11, 2018

fejta-bot commented Mar 13, 2018

liggitt commented Apr 15, 2022

Controllers keeps creating extra controllees if initialization is slow #48893

Controllers keeps creating extra controllees if initialization is slow #48893

Comments

caesarxuchao commented Jul 13, 2017 • edited Loading

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

smarterclayton commented Jul 14, 2017

caesarxuchao commented Jul 14, 2017

caesarxuchao commented Jul 14, 2017

caesarxuchao commented Jul 16, 2017

smarterclayton commented Jul 17, 2017 via email

kelseyhightower commented Jul 17, 2017

ahmetb commented Jul 17, 2017 • edited Loading

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

caesarxuchao commented Jul 17, 2017

smarterclayton commented Jul 19, 2017 via email

caesarxuchao commented Jul 24, 2017

smarterclayton commented Jul 24, 2017

caesarxuchao commented Jul 28, 2017

smarterclayton commented Jul 28, 2017 via email

caesarxuchao commented Jul 28, 2017 • edited Loading

caesarxuchao commented Aug 2, 2017

erictune commented Aug 2, 2017

caesarxuchao commented Aug 3, 2017

caesarxuchao commented Aug 11, 2017

k8s-github-robot commented Oct 13, 2017

fejta-bot commented Jan 11, 2018

fejta-bot commented Feb 11, 2018

fejta-bot commented Mar 13, 2018

liggitt commented Apr 15, 2022

caesarxuchao commented Jul 13, 2017 •

edited

Loading

ahmetb commented Jul 17, 2017 •

edited

Loading

caesarxuchao commented Jul 28, 2017 •

edited

Loading