-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2568: Run control-plane as non-root in kubeadm. #2569
Conversation
Welcome @vinayakankugoyal! |
Hi @vinayakankugoyal. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
the KEP should be under sig-cluster-lifecycle/kubeadm |
22b931b
to
24d44b1
Compare
/cc @BenTheElder |
/cc |
@neolit123 and @BenTheElder - I would love to hear your thoughts on the draft? |
26bffbb
to
2cca427
Compare
|
||
**Stuff that we care about for this kep:-** | ||
| file/directory | Components | File permission | | ||
| -------------------------------------------------|------------|-----------------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to limit the file permissions to owner read/write and group readonly i.e. 640. Would it break anything if we further limit it to 600 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will break anything. You are right, this should be 600. We should go with the strictest permissions.
| file/directory | Components | File permission | | ||
| -------------------------------------------------|------------|-----------------| | ||
| /etc/kubernetes/pki/etcd/server.crt | etcd | 640 etcd etcd | | ||
| /etc/kubernetes/pki/etcd/server.key | etcd | 640 etcd etcd | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, this table implies that all the components i.e. etcd, kube-api-server, kube-controller-manager, etc will have their own unique UID and GIDs that are named as the component name itself?
e.g. component: etcd, user = etcd, group=etcd
If this is correct, it might be worth highlighting this important caveat, that all UIDs and GIDs for all components will be different and that they share the name with their component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should prefix the names with kubeadm-
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, your understanding is correct. Ill call it out here to make it clearer. I do call it out in the 1st point on the Proposal section.
In `kubeadm` the control-plane components are run as [static-pods](https://kubernetes.io/docs/concepts/workloads/pods/#static-pods), i.e. pods directly managed by kubelet. We can use the `runAsUser` and `runAsGroup` fields in [SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core) to run the containers in these Pods with a unique `UID` and `GID` and the `capabilities` field to drop all capabilities other than the ones required. | ||
|
||
### Assigning UID and GID | ||
There are 3 options for setting the `UID`/`GID` of the control-plane components:- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here assuming we will be naming the UID and GID the same as the component they own, is to generate them in such a way that UID == GID == Process ID when running these components as non-root the first time. This is not a perfect solution and has following pros and cons:
Pros:
- Allows mapping of UID and GID to components via Process ID. This might be helpful and intuitive during production troubleshooting
- Ensures uniqueness at a point in time as no two processes at any given point in time will have same process ID.
- Most likely will never take UID = 0 as the process ID for each of the components being zero is highly unlikely, since something (e.g. swapper or scheduler) might be running with Process ID == 0
Cons:
- Might need some statefulness to pick up the the same UID and GID next time any of these components are started or restarted as the process ID could be different
- Might have some overlap with existing UID or Usernames e.g. process ID of api-server is 1001. A user with UID 1001 exists and is running a monitoring service in the same node as api-server. This could lead to some interesting privilege escalation attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also ensure uniqueness if we used adduser and groupadd.
This would add complications on restarts. restarts which are completely managed by kubelet today would now need to be managed by kubeadm somehow to update not only the manifest with the new uid but also all file permissions, this would be fairly complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. If it is not too much trouble, could you add this as 4th option in KEP markdown and share why this is not something that was chosen. I worry that these comments will get lost after KEP is merged and we may end up revisiting this in future. Putting this in KEP will save some future time for all of us :) Happy to make edits to the KEP myself but I am afraid I do not have permissions to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to add it as an option. Can you suggest it as an edit? That way its worded the way you would want it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should say the reasoning for following PAM SYS_UID_MIN & SYS_UID_MAX is to fall in line with distro standards, and will definitely avoid conflicts with actual user accounts. I did link the relevant docs to Debian and Fedora packaging in a previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done!
|
||
2. **Use constant values for uid/gid:** The `UID` and `GID`for each of the control-plane components would be set to some predetermined value in https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go. | ||
|
||
3. **Create system users and let users override the defaults:** Use `adduser --system` or equivalent to create `UID`s in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create `GID`s in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own `UID`s or `GID`s we will support that through `kubeadm` v1beta3 API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could offset the UID and GID used here with some really high value as SYS_UID_MIN to prevent chances of overlap. However, for some flavors of kubernetes this may become a problem as they have their own way of assigning unique User IDs for running pods e.g. openshift: https://www.openshift.com/blog/a-guide-to-openshift-and-uids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents would be to keep this logic simple as to fit everyones requirements by default would be extremely challenging. Giving them the ability to pick their own ids through the API would solve the problem on these flavors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I am definitely in favor of simple and secure.
2. Drop all capabilities from `kube-controller-manager`, `kube-scheduler`, `etcd` Pods. Drop all but cap_net_bind_service from `kube-apiserver` Pod, this will allow us to run as non-root while still being able to bind to ports < 1024. This can be achieved by using the `capabilities` field in the [securityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core) of the containers in the Pods. | ||
|
||
3. Set the seccomp profile to `runtime/default`, this can be done by setting the `seccompProfile` field in the `securityContext`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be extremely valuable to set a default for SELinux or Apparmor profile here as well. Some conformance tests or disclaimers on whether this works in SELinux and apparmor enforced environments would be helpful before admins enable this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we found handling SELinux / AppArmor defaults from the side of kubeadm difficult and distro footprint specific in prior discussions with users that are somewhat experienced with these sec. solutions.
for some users something worked, while something didn't, thus at some point we decided that kubeadm should not try to manage these.
unless there are sane things that we can setup that work for everyone in a distro agnostic way (kubeadm should be distro agnostic), maybe we shouldn't.
in today's SIG Auth breakout meeting about the PSP replacement, the same problem was highlighted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having worked extensively on SELinux and containers, happy to connect offline on this @neolit123 to know more.
Regardless, would it make sense to either add a non-goal
or under testing
to clarify that this feature works or does not work with SELinux or app-armor or that this has not been tested in SELinux or apparmor environments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can enable something by default for the beta.
my vote would be to add it as a non-goal for the time being.
also, advanced users can always customize the default manifest-files. kubeadm has a patches mechanizm for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this as a non-goal for now, but let's reconsider this for beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also relate to allowing declarative patching to the pod spec @neolit123 , which might be a way to enable this but in a way that's out of scope of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we spoke about enabling inline patches as part of v1beta3 but @fabriziopandini had some reservations.
overall i'm +1 to allow patches to be both inline in the config or read from files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as there is a way for customers to configure it to meet their needs I am +1, it doesn't necessarily have to be through the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have updated option 3. to remove the mention of API in favor of the ability of users to patch these in.
- Run `etcd` as non-root in `kubeadm`. | ||
|
||
### Non-Goals | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps compatibility with user namespace enabled environments is a non-goal here? If yes, would be good to call this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, Ill call that out.
| /etc/kubernetes/pki/apiserver-kubelet-client.key | kas | 640 kas kas | | ||
| /etc/kubernetes/pki/front-proxy-client.crt | kas | 640 kas kas | | ||
| /etc/kubernetes/pki/front-proxy-client.key | kas | 640 kas kas | | ||
| /etc/kubernetes/pki/front-proxy-ca.crt | kas, kcm | 640 root front_proxy_ca_crt_readers | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon me if I missed this but could you clarify what front_proxy_ca_crt_readers, ca_crt_readers, sa_key_readers
groups are ? Are these the shared groups between components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are groups that the components will specify as supplemental groups so that they can read the file/cert/cred.
So for example: /etc/kubernetes/pki/fromt-proxy-ca.crt is read by both kube-apiserver and kube-controller-manager, so we will create a group called front_proxy_ca_crt_readers and both kube-apiserver and kube-controller-manger will specify this group's id as a supplemental group. So this way they can both read the file.
- Run `kube-controller-manager` as non-root in `kubeadm`. | ||
- Run `kube-scheduler` as non-root in `kubeadm`. | ||
- Run `etcd` as non-root in `kubeadm`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is kubelet
out of scope for this KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for Linux (with the exception of OpenRC on Alpine) kubeadm manages the kubelet as a systemd service that requires root to stop/start.
all files written under /var/lib/kubelet are owned by root too.
i don't think there is much we can do for the kubelet case unless we stop using systemd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, Ill explicitly call that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Thanks for the input @neolit123
Are there log files that also need their permissions updated? |
no, i don't think kubeadm writes log files. i just remembered that we do write a temporal static-pod manifest / etcd snapshot backup under /etc/kubernetes in case a control-plane node upgrade fails ( |
c085a2a
to
5940618
Compare
readOnly: true | ||
... # omitted to save space. | ||
``` | ||
<br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i noticed these BR tags in a few places. can we omit them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@neolit123 - I was thinking that we merge this PR as provisional and then iterate on it? That way others will also get an opportunity to contribute to it. It is also the recommended way of operation as highlighted in the comments as |
- /usr/share/ca-certificates | ||
- /etc/ssl/certs | ||
- /etc/ca-certificates | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Multi-OS support | |
A Windows control plane is out of scope for this proposal for the time being. | |
OS specific implementations for Linux, would be carefully abstracted behind helper | |
utilities in kubeadm to not block the support for a Windows control plane in the future. |
^ we should add this.
we had a discussion at SIG Windows about the future of a Windows control-plane.
a couple of companies contributing to k8s have customers that want this today, but the demand has been relatively low.
i can see kubeadm supporting a Windows control plane in the future, therefore, i would like us to include a note in the KEP about OS differentiation.
one example that i mentioned on the POC PR is that Chown
from go's standard library just errors out on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Done.
SGTM, please add the above "windows" change before that. we should note that we can start with PRs and their reviews only after the KEP is marked as implementable though. |
Yeah I understand that. I just want to merge what we have and then iterate on the KEP some more, by focusing on the crucial aspects. |
@neolit123 - If the changes look good to you, please approve so that I can merge this PR with the KEP in provisional state. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
thanks @vinayakankugoyal
merging this as provisional.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: neolit123, vinayakankugoyal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
we also have to assign someone to do the PRR (production readiness review) at some point. |
I know someone who recently went through the PRR process for another KEP. I'll talk to them and get more details on the process. |
This KEP proposes that the control-plane in kubeadm be run as non-root. If containers are running as root an escape from a container may result in the escalation to root in host. CVE-2019-5736 is an example of a container escape vulnerabilty that can be mitigated by running containers/pods as non-root.
xref #2568