KEP-2568: Run control-plane as non-root in kubeadm. #2569

vinayakankugoyal · 2021-03-13T19:39:46Z

This KEP proposes that the control-plane in kubeadm be run as non-root. If containers are running as root an escape from a container may result in the escalation to root in host. CVE-2019-5736 is an example of a container escape vulnerabilty that can be mitigated by running containers/pods as non-root.

xref #2568

k8s-ci-robot · 2021-03-13T19:39:53Z

Welcome @vinayakankugoyal!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-03-13T19:39:54Z

Hi @vinayakankugoyal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2021-03-13T20:10:32Z

the KEP should be under sig-cluster-lifecycle/kubeadm

keps/sig-security/2568-kubeadm-non-root-control-plane/kep.yaml

vinayakankugoyal · 2021-03-14T03:13:16Z

/cc @BenTheElder

pacoxu · 2021-03-18T02:38:17Z

/cc

vinayakankugoyal · 2021-03-19T20:25:39Z

@neolit123 and @BenTheElder - I would love to hear your thoughts on the draft?

PushkarJ · 2021-04-07T21:21:57Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+
+**Stuff that we care about for this kep:-**
+| file/directory                                   | Components | File permission |
+| -------------------------------------------------|------------|-----------------|


Is there a reason to limit the file permissions to owner read/write and group readonly i.e. 640. Would it break anything if we further limit it to 600 ?

I don't think this will break anything. You are right, this should be 600. We should go with the strictest permissions.

PushkarJ · 2021-04-07T21:25:29Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+| file/directory                                   | Components | File permission |
+| -------------------------------------------------|------------|-----------------|
+| /etc/kubernetes/pki/etcd/server.crt              | etcd       | 640 etcd etcd   |
+| /etc/kubernetes/pki/etcd/server.key              | etcd       | 640 etcd etcd   |


If I understand it correctly, this table implies that all the components i.e. etcd, kube-api-server, kube-controller-manager, etc will have their own unique UID and GIDs that are named as the component name itself?

e.g. component: etcd, user = etcd, group=etcd

If this is correct, it might be worth highlighting this important caveat, that all UIDs and GIDs for all components will be different and that they share the name with their component.

maybe we should prefix the names with kubeadm- too.

Yup, your understanding is correct. Ill call it out here to make it clearer. I do call it out in the 1st point on the Proposal section.

PushkarJ · 2021-04-07T21:34:47Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+In `kubeadm` the control-plane components are run as [static-pods](https://kubernetes.io/docs/concepts/workloads/pods/#static-pods), i.e. pods directly managed by kubelet. We can use the `runAsUser` and `runAsGroup` fields in [SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core) to run the containers in these Pods with a unique `UID` and `GID` and the `capabilities` field to drop all capabilities other than the ones required.
+
+### Assigning UID and GID
+There are 3 options for setting the `UID`/`GID` of the control-plane components:-


Another option here assuming we will be naming the UID and GID the same as the component they own, is to generate them in such a way that UID == GID == Process ID when running these components as non-root the first time. This is not a perfect solution and has following pros and cons:

Pros:

Allows mapping of UID and GID to components via Process ID. This might be helpful and intuitive during production troubleshooting

Ensures uniqueness at a point in time as no two processes at any given point in time will have same process ID.

Most likely will never take UID = 0 as the process ID for each of the components being zero is highly unlikely, since something (e.g. swapper or scheduler) might be running with Process ID == 0

Cons:

Might need some statefulness to pick up the the same UID and GID next time any of these components are started or restarted as the process ID could be different

Might have some overlap with existing UID or Usernames e.g. process ID of api-server is 1001. A user with UID 1001 exists and is running a monitoring service in the same node as api-server. This could lead to some interesting privilege escalation attacks.

We could also ensure uniqueness if we used adduser and groupadd.
This would add complications on restarts. restarts which are completely managed by kubelet today would now need to be managed by kubeadm somehow to update not only the manifest with the new uid but also all file permissions, this would be fairly complicated.

That makes sense. If it is not too much trouble, could you add this as 4th option in KEP markdown and share why this is not something that was chosen. I worry that these comments will get lost after KEP is merged and we may end up revisiting this in future. Putting this in KEP will save some future time for all of us :) Happy to make edits to the KEP myself but I am afraid I do not have permissions to do the same.

Happy to add it as an option. Can you suggest it as an edit? That way its worded the way you would want it.

We should say the reasoning for following PAM SYS_UID_MIN & SYS_UID_MAX is to fall in line with distro standards, and will definitely avoid conflicts with actual user accounts. I did link the relevant docs to Debian and Fedora packaging in a previous comment.

Good call, done!

PushkarJ · 2021-04-07T21:37:01Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+
+2. **Use constant values for uid/gid:** The `UID` and `GID`for each of the control-plane components would be set to some predetermined value in https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go.
+
+3. **Create system users and let users override the defaults:** Use `adduser --system` or equivalent to create `UID`s in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create `GID`s in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own `UID`s or `GID`s we will support that through `kubeadm` v1beta3 API.


We could offset the UID and GID used here with some really high value as SYS_UID_MIN to prevent chances of overlap. However, for some flavors of kubernetes this may become a problem as they have their own way of assigning unique User IDs for running pods e.g. openshift: https://www.openshift.com/blog/a-guide-to-openshift-and-uids

My 2 cents would be to keep this logic simple as to fit everyones requirements by default would be extremely challenging. Giving them the ability to pick their own ids through the API would solve the problem on these flavors.

You are right. I am definitely in favor of simple and secure.

PushkarJ · 2021-04-07T21:40:48Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+2. Drop all capabilities from `kube-controller-manager`, `kube-scheduler`, `etcd` Pods. Drop all but cap_net_bind_service from `kube-apiserver` Pod, this will allow us to run as non-root while still being able to bind to ports < 1024. This can be achieved by using the `capabilities` field in the [securityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core) of the containers in the Pods.
+
+3. Set the seccomp profile to `runtime/default`, this can be done by setting the `seccompProfile` field in the `securityContext`.
+


It will be extremely valuable to set a default for SELinux or Apparmor profile here as well. Some conformance tests or disclaimers on whether this works in SELinux and apparmor enforced environments would be helpful before admins enable this feature

we found handling SELinux / AppArmor defaults from the side of kubeadm difficult and distro footprint specific in prior discussions with users that are somewhat experienced with these sec. solutions.
for some users something worked, while something didn't, thus at some point we decided that kubeadm should not try to manage these.

unless there are sane things that we can setup that work for everyone in a distro agnostic way (kubeadm should be distro agnostic), maybe we shouldn't.

in today's SIG Auth breakout meeting about the PSP replacement, the same problem was highlighted.

Having worked extensively on SELinux and containers, happy to connect offline on this @neolit123 to know more.

Regardless, would it make sense to either add a non-goal or under testing to clarify that this feature works or does not work with SELinux or app-armor or that this has not been tested in SELinux or apparmor environments?

maybe we can enable something by default for the beta.

my vote would be to add it as a non-goal for the time being.
also, advanced users can always customize the default manifest-files. kubeadm has a patches mechanizm for that.

I'll add this as a non-goal for now, but let's reconsider this for beta.

This might also relate to allowing declarative patching to the pod spec @neolit123 , which might be a way to enable this but in a way that's out of scope of this proposal.

we spoke about enabling inline patches as part of v1beta3 but @fabriziopandini had some reservations.
overall i'm +1 to allow patches to be both inline in the config or read from files.

As long as there is a way for customers to configure it to meet their needs I am +1, it doesn't necessarily have to be through the API.

Ok, I have updated option 3. to remove the mention of API in favor of the ability of users to patch these in.

PushkarJ · 2021-04-07T21:41:45Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+  - Run `etcd` as non-root in `kubeadm`.
+
+### Non-Goals
+


Perhaps compatibility with user namespace enabled environments is a non-goal here? If yes, would be good to call this out.

Agreed, Ill call that out.

PushkarJ · 2021-04-07T21:44:14Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+| /etc/kubernetes/pki/apiserver-kubelet-client.key | kas        | 640 kas kas     |
+| /etc/kubernetes/pki/front-proxy-client.crt       | kas        | 640 kas kas     |
+| /etc/kubernetes/pki/front-proxy-client.key       | kas        | 640 kas kas     |
+| /etc/kubernetes/pki/front-proxy-ca.crt           | kas, kcm   | 640 root front_proxy_ca_crt_readers |


Pardon me if I missed this but could you clarify what front_proxy_ca_crt_readers, ca_crt_readers, sa_key_readers groups are ? Are these the shared groups between components?

These are groups that the components will specify as supplemental groups so that they can read the file/cert/cred.
So for example: /etc/kubernetes/pki/fromt-proxy-ca.crt is read by both kube-apiserver and kube-controller-manager, so we will create a group called front_proxy_ca_crt_readers and both kube-apiserver and kube-controller-manger will specify this group's id as a supplemental group. So this way they can both read the file.

PushkarJ · 2021-04-07T21:47:51Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+  - Run `kube-controller-manager` as non-root in `kubeadm`.
+  - Run `kube-scheduler` as non-root in `kubeadm`.
+  - Run `etcd` as non-root in `kubeadm`.
+


Is kubelet out of scope for this KEP?

for Linux (with the exception of OpenRC on Alpine) kubeadm manages the kubelet as a systemd service that requires root to stop/start.
all files written under /var/lib/kubelet are owned by root too.

i don't think there is much we can do for the kubelet case unless we stop using systemd.

Agreed, Ill explicitly call that out.

Makes sense to me. Thanks for the input @neolit123

tallclair · 2021-04-08T16:41:43Z

Are there log files that also need their permissions updated?

neolit123 · 2021-04-08T16:56:04Z

no, i don't think kubeadm writes log files.

i just remembered that we do write a temporal static-pod manifest / etcd snapshot backup under /etc/kubernetes in case a control-plane node upgrade fails (kubeadm upgrade). but these should be copies of existing files.
~~i think we just have to ensure that migration from non-root to root for the mutable kubeadm upgrade must "chown" after a CP node is considered healthy.~~
EDIT: NVM, the new non-root manifest files will just override the existing ones that require root.

neolit123 · 2021-04-09T00:56:03Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+      readOnly: true
+    ... # omitted to save space. 
+```
+<br/>


nit: i noticed these BR tags in a few places. can we omit them?

vinayakankugoyal · 2021-04-09T01:35:31Z

@neolit123 - I was thinking that we merge this PR as provisional and then iterate on it? That way others will also get an opportunity to contribute to it. It is also the recommended way of operation as highlighted in the comments as Merge early and iterate. WDYT?

neolit123 · 2021-04-09T01:06:36Z

keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane/README.md

+- /usr/share/ca-certificates
+- /etc/ssl/certs
+- /etc/ca-certificates
+


Suggested change

### Multi-OS support

A Windows control plane is out of scope for this proposal for the time being.

OS specific implementations for Linux, would be carefully abstracted behind helper

utilities in kubeadm to not block the support for a Windows control plane in the future.

^ we should add this.

we had a discussion at SIG Windows about the future of a Windows control-plane.
a couple of companies contributing to k8s have customers that want this today, but the demand has been relatively low.

i can see kubeadm supporting a Windows control plane in the future, therefore, i would like us to include a note in the KEP about OS differentiation.

one example that i mentioned on the POC PR is that Chown from go's standard library just errors out on Windows.

Good call! Done.

neolit123 · 2021-04-09T02:43:06Z

@neolit123 - I was thinking that we merge this PR as provisional and then iterate on it? That way others will also get an opportunity to contribute to it. It is also the recommended way of operation as highlighted in the comments as Merge early and iterate. WDYT?

SGTM, please add the above "windows" change before that.

we should note that we can start with PRs and their reviews only after the KEP is marked as implementable though.

vinayakankugoyal · 2021-04-09T14:59:44Z

we should note that we can start with PRs and their reviews only after the KEP is marked as implementable though.

Yeah I understand that. I just want to merge what we have and then iterate on the KEP some more, by focusing on the crucial aspects.

vinayakankugoyal · 2021-04-10T22:09:37Z

@neolit123 - If the changes look good to you, please approve so that I can merge this PR with the KEP in provisional state. Thanks!

neolit123

/lgtm
/approve

thanks @vinayakankugoyal
merging this as provisional.

k8s-ci-robot · 2021-04-12T15:08:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, vinayakankugoyal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-cluster-lifecycle/OWNERS~~ [neolit123]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2021-04-12T15:09:53Z

we also have to assign someone to do the PRR (production readiness review) at some point.
the state of things around PRR are a bit fuzzy to me, ATM.

vinayakankugoyal · 2021-04-12T15:12:07Z

we also have to assign someone to do the PRR (production readiness review) at some point.
the state of things around PRR are a bit fuzzy to me, ATM.

I know someone who recently went through the PRR process for another KEP. I'll talk to them and get more details on the process.

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 13, 2021

k8s-ci-robot requested review from IanColdwater and tabbysable March 13, 2021 19:39

vinayakankugoyal mentioned this pull request Mar 13, 2021

Run control-plane as non-root in kubeadm. #2568

Open

18 tasks

neolit123 reviewed Mar 13, 2021

View reviewed changes

keps/sig-security/2568-kubeadm-non-root-control-plane/kep.yaml Outdated Show resolved Hide resolved

neolit123 reviewed Mar 13, 2021

View reviewed changes

keps/sig-security/2568-kubeadm-non-root-control-plane/kep.yaml Outdated Show resolved Hide resolved

vinayakankugoyal force-pushed the kep branch 2 times, most recently from 22b931b to 24d44b1 Compare March 14, 2021 01:12

k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Mar 14, 2021

k8s-ci-robot requested a review from BenTheElder March 14, 2021 03:13

neolit123 removed request for IanColdwater and tabbysable March 15, 2021 13:29

vinayakankugoyal mentioned this pull request Mar 17, 2021

Run services as non-privileged users kubernetes/kubernetes#7961

Open

vinayakankugoyal marked this pull request as draft March 17, 2021 18:59

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2021

k8s-ci-robot requested a review from pacoxu March 18, 2021 02:38

vinayakankugoyal force-pushed the kep branch 2 times, most recently from 26bffbb to 2cca427 Compare March 22, 2021 04:47

vinayakankugoyal marked this pull request as ready for review March 22, 2021 04:47

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2021

vinayakankugoyal force-pushed the kep branch from 62807a8 to e6f089a Compare April 7, 2021 15:56

PushkarJ suggested changes Apr 7, 2021

View reviewed changes

vinayakankugoyal force-pushed the kep branch from e6f089a to 20bbf5c Compare April 7, 2021 23:35

vinayakankugoyal force-pushed the kep branch 2 times, most recently from c085a2a to 5940618 Compare April 8, 2021 19:36

neolit123 reviewed Apr 9, 2021

View reviewed changes

vinayakankugoyal force-pushed the kep branch from 5940618 to 17637e6 Compare April 9, 2021 01:01

neolit123 reviewed Apr 9, 2021

View reviewed changes

vinayakankugoyal force-pushed the kep branch from 17637e6 to b57deb0 Compare April 9, 2021 14:57

vinayakankugoyal force-pushed the kep branch from b57deb0 to b4fb2d8 Compare April 9, 2021 15:32

KEP for running the control plane as non-root in kube-adm.

07aaf97

vinayakankugoyal force-pushed the kep branch from b4fb2d8 to 07aaf97 Compare April 9, 2021 15:54

neolit123 reviewed Apr 12, 2021

View reviewed changes

k8s-ci-robot assigned neolit123 Apr 12, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2021

k8s-ci-robot merged commit d4cb03d into kubernetes:master Apr 12, 2021

k8s-ci-robot added this to the v1.22 milestone Apr 12, 2021

chasemp mentioned this pull request Apr 14, 2021

k0s --strict mode implementation k0sproject/k0s#623

Open

PushkarJ mentioned this pull request May 26, 2021

REQUEST: New membership for pushkarj kubernetes/org#2736

Closed

6 tasks

vinayakankugoyal mentioned this pull request Jun 3, 2021

REQUEST: New membership for vinayakankugoyal kubernetes/org#2761

Closed

6 tasks


		2. Use constant values for uid/gid: The `UID` and `GID`for each of the control-plane components would be set to some predetermined value in https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go.

		3. Create system users and let users override the defaults: Use `adduser --system` or equivalent to create `UID`s in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create `GID`s in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own `UID`s or `GID`s we will support that through `kubeadm` v1beta3 API.

		2. Drop all capabilities from `kube-controller-manager`, `kube-scheduler`, `etcd` Pods. Drop all but cap_net_bind_service from `kube-apiserver` Pod, this will allow us to run as non-root while still being able to bind to ports < 1024. This can be achieved by using the `capabilities` field in the [securityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core) of the containers in the Pods.

		3. Set the seccomp profile to `runtime/default`, this can be done by setting the `seccompProfile` field in the `securityContext`.

+### Multi-OS support
+A Windows control plane is out of scope for this proposal for the time being.
+OS specific implementations for Linux, would be carefully abstracted behind helper
+utilities in kubeadm to not block the support for a Windows control plane in the future.

KEP-2568: Run control-plane as non-root in kubeadm. #2569

KEP-2568: Run control-plane as non-root in kubeadm. #2569

Conversation

vinayakankugoyal commented Mar 13, 2021 • edited by neolit123 Loading

k8s-ci-robot commented Mar 13, 2021

k8s-ci-robot commented Mar 13, 2021

neolit123 commented Mar 13, 2021

vinayakankugoyal commented Mar 14, 2021

pacoxu commented Mar 18, 2021

vinayakankugoyal commented Mar 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinayakankugoyal Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

randomvariable Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tallclair commented Apr 8, 2021

neolit123 commented Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinayakankugoyal commented Apr 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 commented Apr 9, 2021 • edited Loading

vinayakankugoyal commented Apr 9, 2021

vinayakankugoyal commented Apr 10, 2021

neolit123 left a comment • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 12, 2021

neolit123 commented Apr 12, 2021

vinayakankugoyal commented Apr 12, 2021

vinayakankugoyal commented Mar 13, 2021 •

edited by neolit123

Loading

vinayakankugoyal commented Mar 19, 2021 •

edited

Loading

neolit123 Apr 7, 2021 •

edited

Loading

vinayakankugoyal Apr 9, 2021 •

edited

Loading

randomvariable Apr 9, 2021 •

edited

Loading

neolit123 Apr 7, 2021 •

edited

Loading

neolit123 commented Apr 8, 2021 •

edited

Loading

neolit123 commented Apr 9, 2021 •

edited

Loading

neolit123 left a comment •

edited

Loading