diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index ab738d21a71ae..5f9272a888923 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -156,6 +156,7 @@ different Kubernetes components. | `RotateKubeletServerCertificate` | `false` | Alpha | 1.7 | 1.11 | | `RotateKubeletServerCertificate` | `true` | Beta | 1.12 | | | `RunAsGroup` | `true` | Beta | 1.14 | | +| `SeccompDefault` | `false` | Alpha | 1.22 | | | `ServiceInternalTrafficPolicy` | `false` | Alpha | 1.21 | | | `ServiceLBNodePortControl` | `false` | Alpha | 1.20 | | | `ServiceLoadBalancerClass` | `false` | Alpha | 1.21 | | @@ -783,6 +784,8 @@ Each feature gate is designed for enabling/disabling a specific feature: instead of the DaemonSet controller. - `SCTPSupport`: Enables the _SCTP_ `protocol` value in Pod, Service, Endpoints, EndpointSlice, and NetworkPolicy definitions. +- `SeccompDefault`: Enables the use of `RuntimeDefault` as the default seccomp profile for all workloads. + The seccomp profile is specified in the `securityContext` of a Pod and/or a Container. - `ServerSideApply`: Enables the [Sever Side Apply (SSA)](/docs/reference/using-api/server-side-apply/) feature on the API Server. - `ServiceAccountIssuerDiscovery`: Enable OIDC discovery endpoints (issuer and diff --git a/content/en/docs/reference/command-line-tools-reference/kubelet.md b/content/en/docs/reference/command-line-tools-reference/kubelet.md index 66eb5785de302..e6f26a3209872 100644 --- a/content/en/docs/reference/command-line-tools-reference/kubelet.md +++ b/content/en/docs/reference/command-line-tools-reference/kubelet.md @@ -514,6 +514,7 @@ RemoveSelfLink=true|false (BETA - default=true)
RootCAConfigMap=true|false (BETA - default=true)
RotateKubeletServerCertificate=true|false (BETA - default=true)
RunAsGroup=true|false (BETA - default=true)
+SeccompDefault=true|false (ALPHA - default=false)
ServerSideApply=true|false (BETA - default=true)
ServiceAccountIssuerDiscovery=true|false (BETA - default=true)
ServiceLBNodePortControl=true|false (ALPHA - default=false)
@@ -1073,6 +1074,13 @@ WindowsEndpointSliceProxying=true|false (ALPHA - default=false)
Timeout of all runtime requests except long running request - `pull`, `logs`, `exec` and `attach`. When timeout exceeded, kubelet will cancel the request, throw out an error and retry later. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's `--config` flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.) + +--seccomp-default RuntimeDefault     Default: `false` + + +<Warning: Alpha feature> Enable the use of RuntimeDefault as the default seccomp profile for all workloads. The SeccompDefault feature gate must be enabled to allow this flag, which is disabled per default. + + --seccomp-profile-root string     Default: `/var/lib/kubelet/seccomp` diff --git a/content/en/docs/tutorials/clusters/seccomp.md b/content/en/docs/tutorials/clusters/seccomp.md index 971618cf554d8..c510f4c70768d 100644 --- a/content/en/docs/tutorials/clusters/seccomp.md +++ b/content/en/docs/tutorials/clusters/seccomp.md @@ -3,9 +3,10 @@ reviewers: - hasheddan - pjbgf - saschagrunert -title: Restrict a Container's Syscalls with Seccomp +title: Restrict a Container's Syscalls with seccomp content_type: tutorial weight: 20 +min-kubernetes-server-version: v1.22 --- @@ -13,7 +14,7 @@ weight: 20 {{< feature-state for_k8s_version="v1.19" state="stable" >}} Seccomp stands for secure computing mode and has been a feature of the Linux -kernel since version 2.6.12. It can be used to sandbox the privileges of a +kernel since version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a Node to your Pods and containers. @@ -35,16 +36,66 @@ profiles that give only the necessary privileges to your container processes. ## {{% heading "prerequisites" %}} +{{< version-check >}} + In order to complete all steps in this tutorial, you must install [kind](https://kind.sigs.k8s.io/docs/user/quick-start/) and [kubectl](/docs/tasks/tools/). This tutorial will show examples -with both alpha (pre-v1.19) and generally available seccomp functionality, so +both alpha (new in v1.22) and generally available seccomp functionality. You should make sure that your cluster is [configured correctly](https://kind.sigs.k8s.io/docs/user/quick-start/#setting-kubernetes-version) for the version you are using. +## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads + +{{< feature-state state="alpha" for_k8s_version="v1.22" >}} + +`SeccompDefault` is an optional kubelet +[feature gate](/docs/reference/command-line-tools-reference/feature-gates) as +well as corresponding `--seccomp-default` +[command line flag](/docs/reference/command-line-tools-reference/kubelet). +Both have to be enabled simultaneously to use the feature. + +If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is +defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode. +The default profiles aim to provide a strong set +of security defaults while preserving the functionality of the workload. It is +possible that the default profiles differ between container runtimes and their +release versions, for example when comparing those from CRI-O and containerd. + +Some workloads may require a lower amount of syscall restrictions than others. +This means that they can fail during runtime even with the `RuntimeDefault` +profile. To mitigate such a failure, you can: + +- Run the workload explicitly as `Unconfined`. +- Disable the `SeccompDefault` feature for the nodes. Also making sure that + workloads get scheduled on nodes where the feature is disabled. +- Create a custom seccomp profile for the workload. + +If you were introducing this feature into production-like cluster, the Kubernetes project +recommends that you enable this feature gate on a subset of your nodes and then +test workload execution before rolling the change out cluster-wide. + +More detailed information about a possible upgrade and downgrade strategy can be +found in the [related Kubernetes Enhancement Proposal (KEP)](https://github.com/kubernetes/enhancements/tree/a70cc18/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). + +Since the feature is in alpha state it is disabled per default. To enable it, +pass the flags `--feature-gates=SeccompDefault=true --seccomp-default` to the +`kubelet` CLI or enable it via the [kubelet configuration +file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the +feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides +the minimum required Kubernetes version and enables the `SeccompDefault` feature +[in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster): + +```yaml +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +featureGates: + SeccompDefault: true +``` + ## Create Seccomp Profiles The contents of these profiles will be explored later on, but for now go ahead @@ -108,7 +159,7 @@ docker exec -it 6a96207fed4b ls /var/lib/kubelet/seccomp/profiles audit.json fine-grained.json violation.json ``` -## Create a Pod with a Seccomp profile for syscall auditing +## Create a Pod with a seccomp profile for syscall auditing To start off, apply the `audit.json` profile, which will log all syscalls of the process, to a new Pod. @@ -208,7 +259,7 @@ kubectl delete pod/audit-pod kubectl delete svc/audit-pod ``` -## Create Pod with Seccomp Profile that Causes Violation +## Create Pod with seccomp Profile that Causes Violation For demonstration, apply a profile to the Pod that does not allow for any syscalls. @@ -255,7 +306,7 @@ kubectl delete pod/violation-pod kubectl delete svc/violation-pod ``` -## Create Pod with Seccomp Profile that Only Allows Necessary Syscalls +## Create Pod with seccomp Profile that Only Allows Necessary Syscalls If you take a look at the `fine-pod.json`, you will notice some of the syscalls seen in the first example where the profile set `"defaultAction": @@ -339,7 +390,7 @@ kubectl delete pod/fine-pod kubectl delete svc/fine-pod ``` -## Create Pod that uses the Container Runtime Default Seccomp Profile +## Create Pod that uses the Container Runtime Default seccomp Profile Most container runtimes provide a sane set of default syscalls that are allowed or not. The defaults can easily be applied in Kubernetes by using the @@ -364,5 +415,5 @@ The default seccomp profile should provide adequate access for most workloads. Additional resources: -* [A Seccomp Overview](https://lwn.net/Articles/656307/) +* [A seccomp Overview](https://lwn.net/Articles/656307/) * [Seccomp Security Profiles for Docker](https://docs.docker.com/engine/security/seccomp/)