Heapster keeps restarting every minute in newly deployed k8s cluster #275

chreichert · 2018-10-28T13:49:23Z

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
0.24.1

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.10.9

What happened:
In a newly deployed k8s cluster the heapster component is is being redeployed every minute or so.

What you expected to happen:
Heapster running without problems.

How to reproduce it (as minimally and precisely as possible):
Initial setup of Cluster with acs-engine 0.24.1:

Kubernetes 1.10.9
Private cluster
RBAC enabled
Calico Networkpolicy
Disable addons:
- blobfuse-flexvolume
- smb-flexvolume
1 Master
3 Agentpools as vmss

Anything else we need to know:
I found a question in Stackoverflow (https://stackoverflow.com/questions/46110856/kubernetes-monitoring-service-heapster-keeps-restarting) which describes the very similar behaviour. There a fix was suggested to modify the heapster deployment to contain a label "addonmanager.kubernetes.io/mode: EnsureExists".

I edited the deployment manifest for heapster on all master nodes and changed that label for the heapster deployment from "Reconcile" to "EnsureExists", which fixed the problem for me.

Doing some research, I found the following PR Azure/acs-engine#3918, which changed that label from "EnsureExists" to "Reconcile", so you might check this PR again? See also PR Azure/acs-engine#1133, which introduced that label and was referenced in that above recommended fix.

PacoGuy · 2018-10-28T21:55:53Z

Same issue with acs-engine 0.24.2 and Kubernetes 1.10.8

Is there any way that we can please get better regression testing before changes are made to the system.
It is becoming nearly impossible to deploy new clusters between this heapster issue, kube-dns failures (7 restarts upon deployment before it stabilizes) and random failures to deploy cse-agents.

CecileRobertMichon · 2018-11-08T00:41:24Z

@mboersma what was the reason for removing "EnsureExists" from the heapster deployment in the 1.12 PR?

mboersma · 2018-11-08T01:02:17Z

My reasoning was that (as you pointed out elsewhere) we found that addons weren't being updated during upgrades unless the addon-manager mode label was Reconcile.

Heapster had both the Reconcile and EnsureExists labels, although they represent mutually exclusive policies as I read it. (I assume having duplicate label keys means the latter wins, so EnsureExists was overriding Reconcile.)

In any case, this was a purposeful change to fix acs-engine upgrade, but we should retest that assumption and see if it's possible to change the policy.

shaikatz · 2018-11-21T11:20:37Z

We were also facing this issue with kubernetes 1.11.4 bootstrapped with acs-engine 0.25.3 .

Elaborating a bit about @chreichert solution which did the trick for us as well.

SSH into each of the masters and modify /etc/kubernetes/addons/kube-heapster-deployment.yaml, In the Deployment section change addonmanager.kubernetes.io/mode from Reconclie to EnsureExists.

jackfrancis transferred this issue from Azure/acs-engine Jan 10, 2019

jackfrancis mentioned this issue Jan 11, 2019

fix: set heapster Deployment to EnsureExists to prevent heapster-nann… #295

Merged

4 tasks

acs-bot closed this as completed in #295 Jan 11, 2019

This was referenced May 29, 2019

fix: ensure cluster-autoscaler image gets updated during upgrades #1385

Merged

fix: use reconcile mode in addon specs #1401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heapster keeps restarting every minute in newly deployed k8s cluster #275

Heapster keeps restarting every minute in newly deployed k8s cluster #275

chreichert commented Oct 28, 2018

PacoGuy commented Oct 28, 2018

CecileRobertMichon commented Nov 8, 2018

mboersma commented Nov 8, 2018 •

edited

Loading

shaikatz commented Nov 21, 2018

Heapster keeps restarting every minute in newly deployed k8s cluster #275

Heapster keeps restarting every minute in newly deployed k8s cluster #275

Comments

chreichert commented Oct 28, 2018

Is this a request for help?: yes

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: 0.24.1

PacoGuy commented Oct 28, 2018

CecileRobertMichon commented Nov 8, 2018

mboersma commented Nov 8, 2018 • edited Loading

shaikatz commented Nov 21, 2018

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
0.24.1

mboersma commented Nov 8, 2018 •

edited

Loading