-
Notifications
You must be signed in to change notification settings - Fork 519
Heapster keeps restarting every minute in newly deployed k8s cluster #275
Comments
Same issue with acs-engine 0.24.2 and Kubernetes 1.10.8 Is there any way that we can please get better regression testing before changes are made to the system. |
@mboersma what was the reason for removing "EnsureExists" from the heapster deployment in the 1.12 PR? |
My reasoning was that (as you pointed out elsewhere) we found that addons weren't being updated during upgrades unless the addon-manager mode label was Heapster had both the In any case, this was a purposeful change to fix |
We were also facing this issue with kubernetes 1.11.4 bootstrapped with acs-engine 0.25.3 . Elaborating a bit about @chreichert solution which did the trick for us as well. SSH into each of the masters and modify |
Is this a request for help?:
yes
Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE
What version of acs-engine?:
0.24.1
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.10.9
What happened:
In a newly deployed k8s cluster the heapster component is is being redeployed every minute or so.
What you expected to happen:
Heapster running without problems.
How to reproduce it (as minimally and precisely as possible):
Initial setup of Cluster with acs-engine 0.24.1:
Anything else we need to know:
I found a question in Stackoverflow (https://stackoverflow.com/questions/46110856/kubernetes-monitoring-service-heapster-keeps-restarting) which describes the very similar behaviour. There a fix was suggested to modify the heapster deployment to contain a label "addonmanager.kubernetes.io/mode: EnsureExists".
I edited the deployment manifest for heapster on all master nodes and changed that label for the heapster deployment from "Reconcile" to "EnsureExists", which fixed the problem for me.
Doing some research, I found the following PR Azure/acs-engine#3918, which changed that label from "EnsureExists" to "Reconcile", so you might check this PR again? See also PR Azure/acs-engine#1133, which introduced that label and was referenced in that above recommended fix.
The text was updated successfully, but these errors were encountered: