Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chaoskube always kills the same pod #197

Closed
HaveFun83 opened this issue Apr 15, 2020 · 5 comments
Closed

chaoskube always kills the same pod #197

HaveFun83 opened this issue Apr 15, 2020 · 5 comments
Labels

Comments

@HaveFun83
Copy link

Hi

currently upgrading from v0.15.1 to v0.19.0

NAME                              READY   STATUS    RESTARTS   AGE
chaoskube-demo-7f5ffd44db-djjcc   1/1     Running   0          10m
redis-demo-master-0               1/1     Running   0          8m29s
redis-demo-slave-0                1/1     Running   0          26s
redis-demo-slave-1                1/1     Running   0          7m39s
redis-demo-slave-2                1/1     Running   0          7m14s

and now always redis-demo-slave-0 will be killed no random pod

time="2020-04-15T15:42:03Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.19.0
W0415 15:42:03.745799       6 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-04-15T15:42:03Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-15T15:42:03Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= maxKill=1 minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-15T15:42:03Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Friday Saturday Sunday]"
time="2020-04-15T15:42:03Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-15T15:46:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:48:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:50:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:52:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo

before upgrade chaoskube kills on of the redis-demo-slave pods randomly.

any ideas?

tia

@linki
Copy link
Owner

linki commented Apr 17, 2020

@HaveFun83 Interesting 🤔

Please run it with --debug which gives us some more information if it even has more candidates to pick from. It'll also show us the values of all flags.

@HaveFun83
Copy link
Author

HaveFun83 commented Apr 17, 2020

Log from working v0.15.1

time="2020-04-17T10:54:07Z" level=debug msg="reading config" annotations= debug=true dryRun=false excludedDaysOfYear= excludedPodNames="chaoskube-demo|redis-demo-master" excludedTimesOfDay="16:00-07:00" excludedWeekdays="Sat,Sun" gracePeriod=-1s includedPodNames="<nil>" interval=2m0s kubeconfig= labels= logFormat=text master= metricsAddress=":8080" minimumAge=0s namespaceLabels= namespaces=demo timezone=UTC
time="2020-04-17T10:54:07Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.15.1
time="2020-04-17T10:54:07Z" level=debug msg="using cluster config" kubeconfig= master=
time="2020-04-17T10:54:07Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-17T10:54:07Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-17T10:54:07Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Saturday Sunday]"
time="2020-04-17T10:54:07Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-17T10:54:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:54:08Z" level=info msg="terminating pod" name=redis-demo-slave-1 namespace=demo
time="2020-04-17T10:54:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-1 namespace=demo terminator=DeletePod
time="2020-04-17T10:54:08Z" level=debug msg=sleeping...
time="2020-04-17T10:56:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:56:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T10:56:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T10:56:08Z" level=debug msg=sleeping...
time="2020-04-17T10:58:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:58:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T10:58:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T10:58:08Z" level=debug msg=sleeping...
time="2020-04-17T11:00:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T11:00:08Z" level=info msg="terminating pod" name=redis-demo-slave-2 namespace=demo
time="2020-04-17T11:00:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-2 namespace=demo terminator=DeletePod
time="2020-04-17T11:00:08Z" level=debug msg=sleeping...

Log from not working v0.19.0

time="2020-04-17T11:01:08Z" level=debug msg="reading config" annotations= debug=true dryRun=false excludedDaysOfYear= excludedPodNames="chaoskube-demo|redis-demo-master" excludedTimesOfDay="16:00-07:00" excludedWeekdays="Sat,Sun" gracePeriod=-1s includedPodNames="<nil>" interval=2m0s kubeconfig= labels= logFormat=text master= maxKill=1 metricsAddress=":8080" minimumAge=0s namespaceLabels= namespaces=demo slackWebhook= timezone=UTC
time="2020-04-17T11:01:08Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.19.0
time="2020-04-17T11:01:08Z" level=debug msg="using cluster config" kubeconfig= master=
W0417 11:01:08.166400       7 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-04-17T11:01:08Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-17T11:01:08Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= maxKill=1 minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-17T11:01:08Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Saturday Sunday]"
time="2020-04-17T11:01:08Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-17T11:01:09Z" level=debug msg="found victims" count=1
time="2020-04-17T11:01:09Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T11:01:09Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T11:01:09Z" level=debug msg=sleeping...
time="2020-04-17T11:03:08Z" level=debug msg="found victims" count=1
time="2020-04-17T11:03:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T11:03:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T11:03:08Z" level=debug msg=sleeping...

@linki
Copy link
Owner

linki commented Apr 17, 2020

I think the reason is this.

This was done to prevent killing pods from the same replication group, such as Deployment, StatefulSet and so on. It's only really needed when --max-kill is more than 1.

However, looking at the implementation it looks like it always picks the first pod from the group as the target.

@linki linki added the bug label Apr 23, 2020
Repository owner deleted a comment from ravikumar2000 Apr 29, 2020
Repository owner deleted a comment from ravikumar2000 Apr 29, 2020
@linki
Copy link
Owner

linki commented May 2, 2020

Fixed in #203.

Thanks @HaveFun83 for reporting this.

@linki
Copy link
Owner

linki commented Jul 3, 2020

This is fixed in version v0.20.0.

@linki linki closed this as completed Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants