Skip to content

Commit

Permalink
kep-127: address some review comments
Browse files Browse the repository at this point in the history
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
  • Loading branch information
giuseppe committed Feb 7, 2024
1 parent d45390b commit 2d8be39
Showing 1 changed file with 82 additions and 22 deletions.
104 changes: 82 additions & 22 deletions keps/sig-node/127-user-namespaces/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,43 @@ message Mount {
}
```

The CRI runtime reports what runtime handlers have support for user
namespaces through the `StatusResponse` message, that gains a new
field `runtime_handlers`:

```
message StatusResponse {
// Status of the Runtime.
RuntimeStatus status = 1;
// Info is extra information of the Runtime. The key could be arbitrary string, and
// value should be in json format. The information could include anything useful for
// debug, e.g. plugins used by the container runtime.
// It should only be returned non-empty when Verbose is true.
map<string, string> info = 2;
// Runtime handlers.
repeated RuntimeHandler runtime_handlers = 3;
}
```

Where RuntimeHandler is defined as below:

```
message RuntimeHandlerFeatures {
// supports_user_namespaces is set to true if the runtime handler supports
// user namespaces.
bool supports_user_namespaces = 1;
}
message RuntimeHandler {
// Name must be unique in StatusResponse.
// An empty string denotes the default handler.
string name = 1;
// Supported features.
RuntimeHandlerFeatures features = 2;
}
```

### Support for pods

Make pods work with user namespaces. This is activated via the
Expand Down Expand Up @@ -593,6 +630,7 @@ use container runtime versions that have the needed changes.

- Gather and address feedback from the community
- Be able to configure UID/GID ranges to use for pods
- This feature is not supported on Windows.
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore
the field, but nice to have)

Expand All @@ -603,6 +641,20 @@ use container runtime versions that have the needed changes.

### Upgrade / Downgrade Strategy

Existing pods will still work as intended, as the new field is missing there.

Upgrade will not change any current behaviors.

When the new functionality wasn't yet used, downgrade will not be affected.

On downgrade, when the functionality was used, the pods created with
user namespaces that are running will continue to run with user
namespaces. Pods will need to be re-created to stop using the user
namespace.

Versions of Kubernetes that doesn't have this feature implemented will
ignore the new field `pod.spec.hostUsers`.

### Version Skew Strategy

<!--
Expand Down Expand Up @@ -635,11 +687,13 @@ doesn't create them. The runtime can detect this situation as the `user` field
in the `NamespaceOption` will be seen as nil, [thanks to
protobuf][proto3-defaults]. We already tested this with real code.

Old runtime and new kubelet: containers are created without userns. As the
`user` field of the `NamespaceOption` message is not part of the runtime
protofiles, that part is ignored by the runtime and pods are created using the
host userns.
Old runtime and new kubelet: the runtime won't report that it supports
user namespaces through the `StatusResponse` message, so the kubelet
will detect it and return an error if a pod with user namespaces is
created.

We added unit tests for the feature gate disabled, and integration
tests for the feature gate enabled and disabled.

[proto3-defaults]: https://developers.google.com/protocol-buffers/docs/proto3#default

Expand Down Expand Up @@ -763,15 +817,14 @@ This section must be completed when targeting beta to a release.

The rollout is just a feature flag on the kubelet and the kube-apiserver.

If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node
meets the requirements to use user namespaces, then the pod will be created with the namespace. If
it is scheduled to a node that has the feature disabled, it will be created without the user
namespace.
If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod
will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has
the feature flag activated and the node meets the requirements to use user namespaces, then the
pod will be created with the namespace. If it is scheduled to a node that has the feature disabled,
it will be created without the user namespace.

On a rollback, pods created while the feature was active (created with user namespaces) will have to
be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the
trick.
be re-created without user namespaces.

<!--
Try to be as paranoid as possible - e.g., what if some components will restart
Expand All @@ -788,24 +841,31 @@ will rollout across nodes.
On Kubernetes side, the kubelet should start correctly.

On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if
all node requirements are met.
all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet
returns an error.

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Yes.
Yes, we tested it locally using `./hack/local-up-cluster.sh`.

We tested to enable the feature flag, create a deployment with pod.spec.hostUsers=false, and then disable
the feature flag and restart the kubelet and kube-apiserver.
We tested enabling the feature flag, created a deployment with pod.spec.hostUsers=false, and then disabled
the feature flag and restarted the kubelet and kube-apiserver.

After that, we deleted the deployment pods (not the deployment object), the pods were re-created
without user namespaces just fine, without any modification needed on the deployment yaml.

We then enabled the feature flag on the kubelet and kube-apiserver, and deleted the deployment pod.
This re-created caused the pod to be re-created, this time with user namespaces enabled again.

To validate it, it is necessary to exec into a container in the pod and run the command `cat /proc/self/uid_map`.
When running in a user namespace the output is different than `0 0 4294967295` as it happens when running without
a user namespace.

<!--
Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
Expand Down Expand Up @@ -839,10 +899,9 @@ logs or events for this purpose.

###### How can someone using this feature know that it is working for their instance?

Check if any pod has the pod.spec.hostUsers field set to false and is on RUNNING state on a node
that meets all the requirements.
If the runtime doesn't support user namespaces an error is returned by the kubelet.

There are step-by-step examples in the Kubernetes documentation too.
There are step-by-step examples in the Kubernetes documentation too: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/

<!--
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
Expand All @@ -860,7 +919,8 @@ Recall that end users cannot usually observe component logs or access metrics.
- Other field:
- [x] Other (treat as last resort)
- Details: check pods with pod.spec.hostUsers field set to false, and see if they are in RUNNING
state.
state. Exec into a container and run `cat /proc/self/uid_map` to verify that the mappings are different
than the mappings on the host.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Expand Down Expand Up @@ -1373,9 +1433,9 @@ instead of cluster-wide/cluster-scoped.

Some use cases for longer mappings include:

- running a container tool inside a Pod, where that container tool wants to use a UID range.
- running an application inside a Pod where the application uses UIDs
above 65535 by default.
There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
considered in a future KEP.

### Allow runtimes to pick the mapping

Expand Down

0 comments on commit 2d8be39

Please sign in to comment.