Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controllers keeps creating extra controllees if initialization is slow #48893

Closed
caesarxuchao opened this issue Jul 13, 2017 · 31 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. milestone/removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@caesarxuchao
Copy link
Member

caesarxuchao commented Jul 13, 2017

In 1.7 we introduced the concept of initialization. By default, clients won't see uninitialized objects.

Controllers (replicaset, deployment, job, etc.) have to observe the uninitialized controllees they create, otherwise the controllers continue to create more controllees (thanks to @kelseyhightower for reporting this).

One possible solution is converting all controllers to list/watch uninitialized controllees.

P.S. replicaset and replication controllers might be protected by their expectation systems, but other controllers are not.

cc @lavalamp @smarterclayton

@caesarxuchao caesarxuchao added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 13, 2017
@caesarxuchao caesarxuchao added this to the v1.8 milestone Jul 13, 2017
@caesarxuchao caesarxuchao self-assigned this Jul 13, 2017
@smarterclayton
Copy link
Contributor

I'm hesistant to jump into this without a discussion of the tradeoffs. The clients should wait in many cases, and if they aren't waiting that's a bug.

@smarterclayton
Copy link
Contributor

Observing uninitialized entries is a hammer, I want to make sure this is a nail first.

Can I get an example of the problem?

@smarterclayton
Copy link
Contributor

Also, are these controllers tolerant of watch delays? If not it's a problem with the controllers (in some cases)

@smarterclayton
Copy link
Contributor

GenerateName is definitely part of the problem - controllers that rely heavily on it to create aren't tolerant to any sort of mismatch between actual and observed. If we end up watching uninitialized there we also need to verify they are not outracing their caches

@caesarxuchao
Copy link
Member Author

Observing uninitialized entries is a hammer, I want to make sure this is a nail first.

That's why i opened an issue first ;)

As i mentioned in the initial comment, rc and rs are protected by the expectation system. I missed daemonset and job controller in my earlier grep, they also use the expectation system, so they should be fine. Cronjob controller only creates Job instead of creating pods directly, so it should be ok. That said, i haven't checked if these controllers perform correctly if the "create expectation" aren't met timely.

IIRC statuefulset controller always creates pod one by one, so it should be ok.

Deployment controller names the rs with the hash of the podtemplate, so it won't succeed in creating a duplicate rs, though i don't know if it will lead to other problems. @janetkuo do you know?

@caesarxuchao
Copy link
Member Author

@kelseyhightower with which controller did you see the problem?

@caesarxuchao
Copy link
Member Author

related #22061

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 17, 2017 via email

@kelseyhightower
Copy link
Contributor

@caesarxuchao I'm seeing this with deployments.

@ahmetb
Copy link
Member

ahmetb commented Jul 17, 2017

Also seeing this with Deplyment/RS. The controller keeps creating pods because it doesn't see the uninitialized one, I ended up with 20 pods for a 1-replica Deployment.

Also deleting the Deployment (via kubectl delete) does not delete the Pods, they are just leaked, probably for the same reason.

@caesarxuchao
Copy link
Member Author

Garbage collector controller doesn't list/watch uninitialized objects. If the owner is deleted before the dependents are initialized, the dependents will be deleted by the gc after they are initialized, no matter what PropagationPolicy the owner is deleted with.

This not a big problem for "Orphan", users don't see uninitialized dependents, so they wouldn't expect them orphaned.

If the policy is "Foreground GC", the garbage collector will delete the owner before deleting the uninitialized dependents. It could be a problem if user expects to use "foreground GC" to release all the resources (name, quota) before deletes the owner.

@caesarxuchao
Copy link
Member Author

In the case reported by@ahmetb, no intializer controller was present, so the pods were never initialized. @ahmetb reported one pod was created every 30 second. I'll check if it's a problem with the "expectation" framework.

@caesarxuchao
Copy link
Member Author

The ExpectationsTimeout is 5 * time.Minute, so it doesn't explain why new pod is created every 30s. I need to dig deeper.

@caesarxuchao
Copy link
Member Author

It's because the creations timeout and were considered failed, so the expectation was dropped, https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L470-L475

err = rsc.podControl.CreatePodsWithControllerRef(rs.Namespace, &rs.Spec.Template, rs, controllerRef)
if err != nil {
	// Decrement the expected number of creates because the informer won't observe this pod
	glog.V(2).Infof("Failed creation, decrementing expectations for replica set %q/%q", rs.Namespace, rs.Name)
	rsc.expectations.CreationObserved(rsKey)
	errCh <- err
}

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 19, 2017 via email

@caesarxuchao
Copy link
Member Author

Agree that the controller should tolerate the timeout on create. In the current settings, if the pod is not initialized in the next 5 mins, the controller will forget the "create expectation" and go on creating another pod. In case of a failing initializer controller, it could take long time to get it back online, so still, many pods will be created. @smarterclayton is that acceptable? And what are the concerns if we let the controllers list/watch uninitialized objects?

@smarterclayton
Copy link
Contributor

The biggest concern is that if we add this to all informers, then all clients of informers need to opt in explicitly to uninitialized resources (otherwise we will break them unexpectedly), vs opting out, which is a signature change to informers.

@caesarxuchao
Copy link
Member Author

Controllers should not retry immediately if CREATE fails due to initialization timeout. This part is not controversial. I'll send a PR.

Regarding if we should update existing controllers list/watch uninitialized objects, I think

  • We should hide uninitialized controller objects from controllers, e.g., hiding uninitialized replicasets from the replicaset controller.
  • We have to expose uninitialized controllees. Because
    1. Otherwise controllers will create an excessive number of uninitialized controllees if an initializer fails for a long time.
    2. In case that controllees stuck in the uninitialized state, the controller should update the Status to expose the fact, e.g., "5 pods uninitialized", rather than showing "0 ready pods". Otherwise it will be hard for users to see what's going on.

We'll need to do complex changes to sharedInformers to let it hide uninitialized controller but expose uninitialized controllees. Or we can configure the sharedInformers to expose all uninitialized objects to the controllers, and rely on individual controllers to stay away from uninitialized controller objects. Either way we'll need to change the signature of sharedInformers, as @smarterclayton pointed out.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 28, 2017 via email

@caesarxuchao
Copy link
Member Author

caesarxuchao commented Jul 28, 2017

@smarterclayton is there anything i can do for the the shared informer refactoring? Is there an on-going discussion or shall I start my own investigation?

@caesarxuchao
Copy link
Member Author

cc @erictune

@erictune
Copy link
Member

erictune commented Aug 2, 2017

If a controller binary watches uninitialized objects, does it see more Watch traffic?
If a cluster has N initializers, does each initializer's update show up in the watch stream?
Or is there some batching?

@caesarxuchao
Copy link
Member Author

If a cluster has N initializers, does each initializer's update show up in the watch stream?
Yes. No batching.

If a controller binary watches uninitialized objects, does it see more Watch traffic?

Yes, it sees the initialization traffic. If the initializers are wriitten in an efficient way, then N initializers add N events. For a pod, N more events is negligible in its lifetime.

k8s-github-robot pushed a commit that referenced this issue Aug 6, 2017
…ize-timeout

Automatic merge from submit-queue (batch tested with PRs 49855, 49915)

Let controllers ignore initialization timeout when creating pods

Partially address #48893 (comment).

This only updates the controllers that create pods with `GenerateName`.

The controllers ignore the timeout error when creating the pods, depending on how the initialization progress:
* If the initialization is successful in less than 5 mins, the controller will observe the creation via the informer. All is good.
* If the initialization fails, server will delete the pod, but the controller won't receive any event. The controller will not create new pod until the Creation expectation expires in 5 min.
* If the initialization takes too long (> 5 mins), the Creation expectation expires and the controller will create extra pods.

I'll send follow-up PRs to fix the latter two cases, e.g., by refactoring the sharedInformer.
@caesarxuchao
Copy link
Member Author

I'm prototyping a refactor of the shared informer.

k8s-github-robot pushed a commit that referenced this issue Aug 26, 2017
Automatic merge from submit-queue

Let the quota evaluator handle mutating specs of pod & pvc

### Background
The final goal is to address #47837, which aims to allow more mutation for uninitialized objects.

To do that, we [decided](#47837 (comment)) to let the admission controllers to handle mutation of uninitialized objects.

### Issue
#50399 attempted to fix all admission controllers so that can handle mutating uninitialized objects. It was incomplete. I didn't realize although the resourcequota admission plugin handles the update operation, the underlying evaluator didn't. This PR updated the evaluators to handle updates of uninitialized pods/pvc.

### TODO
We still miss another piece. The [quota replenish controller](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourcequota/replenishment_controller.go) uses the sharedinformer, which doesn't observe the deletion of uninitialized pods at the moment. So there is a quota leak if a pod is deleted before it's initialized. It will be addressed with #48893.
k8s-github-robot pushed a commit that referenced this issue Sep 1, 2017
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)

Add a persistent volume label controller to the cloud-controller-manager

Part of kubernetes/enhancements#88

Outstanding concerns needing input:
- [x] Why 5 threads for controller processing?
- [x] Remove direct linkage to aws/gce cloud providers [#51629]
- [x] Modify shared informers to allow added event handlers ability to include uninitialized objects/using unshared informer #48893
- [x] Use cache.MetaNamespaceKeyFunc in event handler?

I'm willing to work on addressing the removal of the direct linkage to aws/gce after this PR gets in.
@caesarxuchao caesarxuchao modified the milestones: v1.9, v1.8 Sep 5, 2017
@caesarxuchao caesarxuchao added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 5, 2017
@caesarxuchao caesarxuchao changed the title Controllers should list/watch uninitialized controllees Controllers keeps creating extra controllees if initialization is slow Sep 5, 2017
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 5, 2017
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Removed

@caesarxuchao @kubernetes/sig-api-machinery-bugs

Important: This issue was missing the status/approved-for-milestone label for more than 7 days.

Help

@k8s-github-robot k8s-github-robot removed this from the v1.9 milestone Oct 13, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@liggitt
Copy link
Member

liggitt commented Apr 15, 2022

alpha initializers feature removed in 1.14

@liggitt liggitt removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. milestone/removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants