Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

[ETCD] auto-compaction on kube-aws 0.9.7? #1061

Closed
billyteves opened this issue Dec 6, 2017 · 16 comments
Closed

[ETCD] auto-compaction on kube-aws 0.9.7? #1061

billyteves opened this issue Dec 6, 2017 · 16 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@billyteves
Copy link

Would like to ask if there is a history compaction on ETCD? Or any related ETCD maintenance?
https://github.com/coreos/etcd/blob/master/Documentation/op-guide/maintenance.md

@billyteves
Copy link
Author

after a month of running kubernetes, I tried to create a namespace and the api returned a message: Error from server: etcdserver: mvcc: database space exceeded . Not sure if this is related to hitsory-compaction ?

@mumoshu
Copy link
Contributor

mumoshu commented Dec 6, 2017

I've never experienced it myself but probably you need to run a compaction manually?
etcd-io/etcd#7986 (comment)

As far as I remember, we had not added the flag to enable the auto compaction feature.

@billyteves
Copy link
Author

billyteves commented Dec 6, 2017

Noted @mumoshu . I was suprise, I have 2.5GB of etcd data. It really needs to be compacted. Will this be added in the future releases of kube-aws?

@whereisaaron
Copy link
Contributor

Looking at the journald logs of the etcd cluster notes I just created with kube-aws 0.9.9, it seems like there is scheduled auto-compaction occurring every 5 minutes.

"message": "2017-12-18 23:13:08.569589 I | mvcc: finished scheduled compaction at 22254 (took 1.200345ms)",
"message": "2017-12-18 23:18:08.596613 I | mvcc: finished scheduled compaction at 22918 (took 1.340149ms)",
"message": "2017-12-18 23:23:08.626787 I | mvcc: finished scheduled compaction at 23427 (took 1.197704ms)",

@billyteves
Copy link
Author

@whereisaaron i see, is there a auto-defrag? Defragmenting ETCD I was able to reduced and optimized ETCD. From 2.5GB going to just only 400+MB still not sure why ETCD gets huge space and shrink.

@whereisaaron
Copy link
Contributor

The documentation you mentioned indicates that defragmentation blocks data access and so effectively takes the etcd node offline. For a large defragmentations it sounds like you would need to carefully do it to one node as a time, least you lose quorum and the whole etcd cluster.

Couple of ideas:

  1. If you are using etcd-lock reboots for automatic Core OS updates, then perhaps it is something that you could do at start-up (maybe just before etcd starts running, using the offline method in the documentation).

  2. Maybe a cron job could get etcd-lock, defrag etcd, then wait until local etcd is healthy (again), then release the lock.

@mumoshu would either of those strategies make sense for an auto-defrag?

@cknowles
Copy link
Contributor

cknowles commented Feb 8, 2018

I think etcdctl may deal with that concern, although the docs are not particular clear. Based on running it, it does appear the defrag occurs on one node at a time. I'm passing the full list in like
ETCDCTL_API=3 ./etcdctl defrag --endpoints=$ETCD_ENDPOINTS

I've also run it on single node etcd dev clusters and that was also fine although I've not done that so many times.

@billyteves
Copy link
Author

Right, I've been defragmenting each node one-at-a-time. It is very important to defragment the ETCD as it can cause issue not only on ETCD itself but also doing backup to S3. Kindly correct me if I'm wrong here, I think kube-aws has an etcdadm-save. I think it backups the ETCD data, saved the backup somewhere in the root volume then pushes the it to S3. It sometimes fail for me as the location where it dumps the data has a small space.

@whereisaaron
Copy link
Contributor

whereisaaron commented Feb 10, 2018

Yeah it does seems to do it in turn. It is not done automatically because it creates potentially significant latency. But with no indication if etcdctl checks if the cluster is health before hitting the next endpoint

If we were to schedule it, I'd be tempted to be a little more conservative that checks all is good between dealing to each node.

set -e
for n in node-a node-b node-c; do
 ./etcdctl cluster-health && ETCDCTL_API=3 ./etcdctl defrag --endpoints=$n
 sleep 20
done

Or e.g. maybe use the etcd-lock method and schedule each node to defragment on its own schedule. Maybe a different day of the week (etcd-0=sunday..) that way we can defragment up to 7 node clusters on a weekly basis 😄

@whereisaaron
Copy link
Contributor

Some recent relevant discussion about the 'stop the world' behaviour of defrag: etcd-io/etcd#9222

I was going to ask how we detect how fragmented the database is, but apparently there is no current mechanism (other than defragging and seeing what happens).

Also, it appears that even defragmenting one node at a time can result in errors for k8s operations. You might need to actually remove an etcd node, delete its database, then re-add it, to get an error-free defragmentation according to this:

I tested with etcd v3.1.14 servers (3-node cluster) and a test client app implemented with clientv3 from master (64713e5). The test client app simply spawns 4 goroutines that issue Put() requests every 10 milliseconds.
I triggered defrag on a node and it took for a while like 8 seconds (I'm using a HDD for this env), then some requests simply returned context dealine exceeded errors. So it's not hidden by the retry mechanism.
User apps most likely detect those errors and alert or start failure state procedures. So I assume it would be great if we can improve the retry mechanism for deadline exceeded error from a node.

@billyteves
Copy link
Author

Usually, based on my experience, I just check the size of my ETCD if it goes 1G above. Because with a running of 4,000 containers in a single cluster will take about 500~700MB. Based on what I have read before, the optimize size of ETCD should not be greater than 2GB. But yeah, maybe we can have that kind of using etcd-lock. Im very anxious to always monitor ETCD on my end I already experience a chain reaction when there is an issue on ETCD and some of my pods are not working properly.

@wwyiwzhang
Copy link
Contributor

Our team experienced a similar problem recently. I created this PR: #1427 to address some of the issues.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants