"etcdctl auth enable" command breaks Kubernetes cluster #8458

struz · 2017-08-28T07:41:07Z

Problem

Enabling auth on etcd v3 when it is used by any Kubernetes cluster will force the cluster into read only mode after less than a few days.

The database slowly (or rapidly on a high load cluster) uses more and more memory for its database due to extra revisions until it maxes out and goes into read-only mode.

This is because the Kubernetes API server no longer has access to perform compaction every five minutes (see here for the source). Enabling etcd auth will restrict all administration functions to users granted the "root" role only.

Reproduction

System info

etcd version:

etcd-0 ~ # etcd --version
etcd Version: 3.2.1
Git SHA: 61fc123
Go Version: go1.8.3
Go OS/Arch: linux/amd64

Kubernetes version: 1.7.2

Reproduction steps

Create a certificate etcd-server where CN=etcd-server. This is used to allow Kubernetes to auth with the etcd cluster
Run etcd with the following parameters

Environment="ETCD_ADVERTISE_CLIENT_URLS=https://etcd-0.<domain>:2379"
Environment="ETCD_CERT_FILE=/etc/ssl/etcd/etcd-server.pem"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://etcd-0.<url>:2380"
Environment="ETCD_INITIAL_CLUSTER=etcd-0.<domain>=https://etcd-0.<domain>:2380,etcd-1.<domain>=https://etcd-1.<domain>:2380,etcd-2.<domain>=https://etcd-2.<domain>:2380"
Environment="ETCD_KEY_FILE=/etc/ssl/etcd/etcd-server-key.pem"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://<ip_address>:2380"
Environment="ETCD_NAME=etcd-0.<domain>"
# Environment="ETCD_DATA_DIR=/media/data"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/etcd/etcd-peer.pem"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/etcd/etcd-peer-key.pem"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem"
Environment="ETCD_METRICS=extensive"

Enable auth on the cluster with the following commands

$ etcdctl role add apiserver
$ etcdctl role grant-permission apiserver --prefix=true readwrite /registry/
$ etcdctl user add etcd-client:password
$ etcdctl user grant-role etcd-client apiserver
$ etcdctl auth enable

Set up Kubernetes (we used v1.7.2) using the etcd cluster as a backend
Kubernetes can now read and write everything in its keyspace fine but it will not be able to run compaction commands. The error provided in the Kube API server logs is:

kube-system/<hostname>[kube-apiserver]: E0822 03:53:13.000640       1 compact.go:123] etcd: endpoint ([https://etcd-0.<domain>:2379 https://etcd-1.<domain>:2379 https://etcd-2.<domain>:2379]) compact failed: etcdserver: permission denied

No error message is logged on the etcd server side when this access is denied.

Resolution

It seems to us that auth is fundamentally broken right now when used in conjunction with Kubernetes.

Not being able to delegate some of the less impactful administration commands means that there is an all-or-nothing approach to auth. You either have the root account/role and can modify the entire auth system, or you don't and can't even compact the database. Obviously giving the root cert out to anything but the etcd nodes themselves would be silly from a security perspective since then it is possible for other places to reconfigure your authn/z security.

The best fix for this, in my opinion, is to allow modular access privileges to be granted to different administration commands. Specifically defrag and compaction should be able to be delegated to other roles in the auth system.

The text was updated successfully, but these errors were encountered:

mitake · 2017-08-29T14:42:02Z

Hi @struz , thanks for reporting the problem.

The best fix for this, in my opinion, is to allow modular access privileges to be granted to different administration commands. Specifically defrag and compaction should be able to be delegated to other roles in the auth system.

This opinion seems to be reasonable for me. Adding other special role for the maintenance purpose will be useful for avoiding having multiple user accounts or certs in the client side. How do you think? @xiang90 @heyitsanthony If it is ok for other etcd design policies, I'll solve it.

heyitsanthony · 2017-08-29T16:19:20Z

@mitake special roles probably won't be flexible enough. Could roles encode the rpc permissions? Something like:

message ServicePermission {
  string service = 1;
  string procedure = 2;
}

message Role {
  bytes name = 1;

  repeated Permission keyPermission = 2;
  repeated ServicePermission servicePermission = 3;
}

The service permission for Compact would be {Service: "KV", Procedure: "Compact"}

xiang90 · 2017-08-29T22:11:50Z

It seems to us that auth is fundamentally broken right now when used in conjunction with Kubernetes.

When you want to do application-driven compaction, you almost assume that the application actually owns etcd. That is true for kubernetes. We suggest people to view k8s as a owner of the etcd cluster.

xiang90 · 2017-08-29T22:12:03Z

/cc @jpbetz

jpbetz · 2017-08-29T22:26:08Z

I believe @xiang90 is correct. For additional details on the kubernetes side, please direct questions to sig-auth: https://github.com/kubernetes/community/tree/master/sig-auth

mitake · 2017-08-30T14:42:23Z

@heyitsanthony @xiang90 @jpbetz thanks for your comments. It seems that more consideration is required before working on.

@struz could you share more detail about your use case? Do you have to store information in your k8s' etcd? Is having multiple etcd clusters not acceptable?

struz · 2017-08-31T05:49:54Z

@jpbetz I'm not sure why directing questions at the sig-auth would be relevant here. This is an inability of the app to do something. It either can or can't be done with the auth configuration inside etcd; Kubernetes itself has nothing to do with it, and was merely affected by it.

@mitake sure thing.
Our k8s cluster has to run Calico with the etcdv2 backend to get the level of network access control we need for our environments. Some of our clusters run untrusted code on Pods and so we need the ability to limit what those Pods can talk to at the network level.

We could run a separate etcd cluster for the Calico data, but the overhead in monitoring and maintaining a separate etcd cluster is not one we want to go with as a default option.

Also, because of the nature of running some untrusted code, we would prefer not to have our k8s nodes with a certificate that has root access to the etcd node so that we mitigate risk in the rare case of container breakouts.

To play devil's advocate to myself, no pods should ever be running on the nodes with this certificate anyway (k8s apiserver nodes), but it seems to undermine the overall effect of auth to not be able to secure an etcd cluster like this. Given that we can lock the apiserver down to write only in '/registry/*' keys, the fact that this also breaks the cluster via breaking auto compaction is not ideal.

Overall, it feels wrong to me to require a client of a database to run the compaction itself. This is like asking a Java app to run it's own garbage collection, or a postgres user to vacuum its own database. These might happen in specific circumstances to cope with specific load patterns in the apps but it should not be the norm, and in both cases there are ways to use automatic cleanup. The automatic compaction built into etcd is insufficient because a) it can only run hourly, and b) compaction requests from a non-root source are not possible due to auth.

To draw another parallel with postgres this would be like needing to give all users of your database administrator access to allow them to save you, the administrator, space and prevent the cluster eventually falling over. Of course, I am not an expert on the etcd architecture and I may be wrong with these comparisons. If so I am keen to hear about why things were done in a certain way for etcd.

At the very least the fact that the recommendation is to give the root certificate to an owning application, e.g. Kubernetes, should be clearly documented somewhere. This could be in etcd docs, k8s docs, or both. I did not once encounter anything that mentioned this, and discovering that auto-compaction was even a thing in k8s was only done post-breakage of the cluster.

To fix this problem I think that either auto compaction needs to become much more granular and configurable, auth to the cluster needs to be more granular, or even both.

mitake · 2017-09-01T05:30:56Z

@struz thanks for sharing the detail. I understand the motivation. How about disabling compaction requests from apiserver and issuing the requests from external management stuff (e.g. cron job of etcdctl with root privilege)? I created an experimental change of k8s for disabling the compaction here: kubernetes/kubernetes#51765

struz · 2017-09-04T04:46:51Z

@mitake yes, we ended up making a cron job (although unfortunately making sure this cron-job only runs once in a HA system is a bit harder than we would like) but it works just fine. Thanks for raising that PR.

mitake · 2017-09-15T04:58:40Z

@struz I added a new option --etcd-compaction-interval to kube-apiserver side: kubernetes/kubernetes#51765 . If you pass an argument 0 with the option, apiserver won't issue compaction requests. The PR is already approved so it will be merged in the near future. Could you try it?

youngnick · 2017-09-17T23:28:12Z

I think there's some confusion here over what this ticket is about. (I work in the same team as @struz, for explanation). We understand that the Kubernetes apiserver manages the compaction - that is not the problem. The service that we're using allows us to perform the same compaction from the etcd node, so that we don't have to grant root access to our single persistent datastore across the network.

I'd like to hear from @xiang90 and @jpbetz - Is the decision to require root access and root access alone for common database maintenance operations an architectural decision or a resourcing one? If it's an architectural one, I disagree, but it's your call to make. If it's a resourcing decision, that's fine, but right now, reading this ticket, it seems like the only answer we have is "kubernetes should manage the compaction" - which doesn't feel like answering the actual question we're asking.

To be clear - here is what I would like to know:

Is the idea of creating an assignable role that can trigger a compaction (and only a compaction) a non-starter? As in, something that the project is architecturally opposed to? If so, why?

I understand that the consumer of the database (in this case Kubernetes) is the one that should manage the compaction. But, I'd like the tools to be able to do it securely.

Regardless, I think documentation updates to the authentication and maintenance docs would be helpful to highlight that compaction is only possible with the root user/role right now. Having the process be "I'm going to enable authentication for etcd. Hmm, why is my kubernetes cluster now dying?" is not good for anyone - we need to make this information more discoverable. I'm having a look through the documentation now and will prepare a PR.

heyitsanthony · 2017-09-18T00:13:03Z

@youngnick authorizing RPCs like compaction with non-root roles would be fine to have, but it's not urgent

xiang90 · 2017-09-18T00:28:15Z

@youngnick

What I said is that Kubernetes assumes it owns etcd as it is today. And I wanted to understand why you want to enable Auth when using etcd with Kubernetes. If there is a valid reason, then we can prioritize this issue. Neither @struz or you provided the answer. If you want a feature, it is better to tell us about the general use case and why it is important instead of telling us x is fundamentally broken for you or your company.

youngnick · 2017-09-18T00:45:52Z

Thanks @xiang90 - that wasn't clear from your answer before.

The proximal reason that we enabled auth right now is that we wanted to be able to only have one etcd cluster to store the main Kubernetes bits, and also the the Calico Network Policy bits (which need to be accessible from all Nodes, not just the ones running the apiserver and controller-manager). I understand this is possibly inadvisable, but we were willing to try it out. (Calico uses the v2 backend, Kubernetes the v3, so it kind of works.)

However, the secondary, and more long term reason for enabling auth is that I believe auth should be enabled everywhere. I think that granting a specific set of privileges rather than root is better security practice and leads to less reliability problems in the long term.

I agree that Kubernetes does assume that it runs with root on etcd right now. As I just said though, I don't think that's desirable in the longer term . I had hoped that, by starting the process of teasing out the bits that require root authentication right now, we could begin the process of making Kubernetes not require root, and to document the permissions it does require.

I have no objection whatsoever to you saying "this is a problem, but we cannot prioritise it right now". That's completely acceptable, I acknowledge that we are running in an unusual configuration right now. However, getting this onto your backlog now will mean that it can be done sooner rather than later, when you have the resources available. I'm not in any way trying to throw stones at you about it. I just want some acknowledgement that this is an issue that will need to be looked at at some point.

Alternatively, if you believe that remote consumers of the database will always be the owner and require root access for maintenance tasks, that's your call. I just need to know if that is the case so I can prioritise downstream resources for us.

xiang90 · 2017-09-18T01:08:37Z

@youngnick

Thanks for the explanation. I agree with what you said above.

From my understanding this is super urgent for you either. @mitake and @heyitsanthony already have this on their list. We will prioritize this accordingly.

youngnick · 2017-09-19T05:53:32Z

Yes, this is not urgent for us right now, we have a workaround.

I'll put this on our 'tickets to watch' list and we can all move on until the work can be prioritised. Thanks!

@xiang90

Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ```

@xiang90

Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7

@xiang90

Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7

@xiang90

Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7

@xiang90

Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7

youngnick · 2018-03-22T03:50:07Z

Hey @gyuho or @xiang90, looks like the original cause of this ticket is even less urgent now - with the addition of the revision-based auto-compaction in 3.3 (from #8098, it seems), once Kubernetes supports 3.3, we'll just be able to have the Kubernetes docs updated to say, if you turn on etcd auth, turn on these etcd options.

That is, our original request to have a role available that can do compaction across the network is not required, as we (or anyone else) can now trigger compactions on a desired interval using etcd itself.

I'll leave it to you to determine if you want to do anything further here, it's probably good to have more finely-grained control of the permissions, but for our use case, is no longer required.

Quentin-M · 2020-03-13T23:13:24Z

We just ran into this issue.

This commit adds new etcd client certificates to be generated in testing environments, as with etcd RBAC enabled, each certificate's CN will represent a separate user. The intention is to have following users: - root - fully-privileged user for administrative actions - kube-apiserver - user dedicated for kube-apiserver, also fully privileged for the time being because of etcd-io/etcd#8458. - prometheus - user for Prometheus to scrape etcd metrics For local-testing, we also add rendering of few scripts, which can be used for testing with etcdctl and to manually enable RBAC on etcd cluster. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>

Change the metrics port of etcd from `https` to `http` because: - When you keep metrics port on https you need certificate to scrape that endpoint. You can't simply skip the TLS check and expect to get the data, a client cert is needed. - Providing the apiserver client cert to prometheus operator is counter productive to security. So it is not a very viable option. Because this cert has root permissions on the etcd cluster. - We can create another user that has permissions to scrape metrics endpoint only, but it is not trivial. See the upstream issue which mentions how cert access etcd is either access to everything or nothing. Issue: etcd-io/etcd#8458. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>

stale · 2022-09-21T02:36:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

struz changed the title ~~etcdctl auth enable command breaks Kubernetes cluster~~ "etcdctl auth enable" command breaks Kubernetes cluster Aug 28, 2017

mitake mentioned this issue Sep 1, 2017

Add an option for turning on/off compaction from apiserver in etcd3 mode kubernetes/kubernetes#51765

Merged

heyitsanthony mentioned this issue Sep 5, 2017

support <1hr automatic compaction retention #8503

Closed

xiang90 added the component/etcdserver/auth label Oct 4, 2017

gyuho added stage/investigating area/auth and removed component/etcdserver/auth labels Feb 25, 2018

invidian mentioned this issue Feb 19, 2020

Enable etcd user management flexkube/libflexkube#53

Open

invidian mentioned this issue Apr 9, 2020

Document manual steps to setup monitoring for etcd kinvolk/lokomotive#270

Closed

MohamedEssamAdawy mentioned this issue Apr 4, 2021

Can't reach ETCD for quantum storage in VDC threefoldtech/js-sdk#2859

Closed

xmonader mentioned this issue Apr 4, 2021

SDK Manual - documentation: show how to use evdc etcd for QSFS config threefoldtech/quantum-storage#3

Closed

2 tasks

serathius added stage/tracked and removed stage/investigating stage/tracked labels Jun 13, 2022

stale bot added the stale label Sep 21, 2022

stale bot closed this as completed Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"etcdctl auth enable" command breaks Kubernetes cluster #8458

"etcdctl auth enable" command breaks Kubernetes cluster #8458

struz commented Aug 28, 2017

mitake commented Aug 29, 2017

heyitsanthony commented Aug 29, 2017

xiang90 commented Aug 29, 2017 •

edited

Loading

xiang90 commented Aug 29, 2017

jpbetz commented Aug 29, 2017

mitake commented Aug 30, 2017

struz commented Aug 31, 2017

mitake commented Sep 1, 2017

struz commented Sep 4, 2017

mitake commented Sep 15, 2017

youngnick commented Sep 17, 2017 •

edited

Loading

heyitsanthony commented Sep 18, 2017

xiang90 commented Sep 18, 2017

youngnick commented Sep 18, 2017

xiang90 commented Sep 18, 2017

youngnick commented Sep 19, 2017

youngnick commented Mar 22, 2018

Quentin-M commented Mar 13, 2020

stale bot commented Sep 21, 2022

"etcdctl auth enable" command breaks Kubernetes cluster #8458

"etcdctl auth enable" command breaks Kubernetes cluster #8458

Comments

struz commented Aug 28, 2017

Problem

Reproduction

System info

Reproduction steps

Resolution

mitake commented Aug 29, 2017

heyitsanthony commented Aug 29, 2017

xiang90 commented Aug 29, 2017 • edited Loading

xiang90 commented Aug 29, 2017

jpbetz commented Aug 29, 2017

mitake commented Aug 30, 2017

struz commented Aug 31, 2017

mitake commented Sep 1, 2017

struz commented Sep 4, 2017

mitake commented Sep 15, 2017

youngnick commented Sep 17, 2017 • edited Loading

heyitsanthony commented Sep 18, 2017

xiang90 commented Sep 18, 2017

youngnick commented Sep 18, 2017

xiang90 commented Sep 18, 2017

youngnick commented Sep 19, 2017

youngnick commented Mar 22, 2018

Quentin-M commented Mar 13, 2020

stale bot commented Sep 21, 2022

xiang90 commented Aug 29, 2017 •

edited

Loading

youngnick commented Sep 17, 2017 •

edited

Loading