From f7723e5328a97a4c84e7df0ad5c98becb3ddaeea Mon Sep 17 00:00:00 2001 From: Lee Bernick Date: Mon, 18 Apr 2022 09:40:54 -0400 Subject: [PATCH] Revamp compute resources documentation This commit updates Tekton documentation on compute resources. It separates Kubernetes background into a separate section, describes how resource requests differ in Tekton and in Kubernetes, adds examples for LimitRange, and adds details for ResourceQuotas. --- docs/compute-resources.md | 236 ++++++++++++++++++++++++++++++++++++++ docs/limitrange.md | 105 ----------------- docs/pipelineruns.md | 2 +- docs/taskruns.md | 2 +- docs/tasks.md | 5 +- 5 files changed, 242 insertions(+), 108 deletions(-) create mode 100644 docs/compute-resources.md delete mode 100644 docs/limitrange.md diff --git a/docs/compute-resources.md b/docs/compute-resources.md new file mode 100644 index 00000000000..87559ac930c --- /dev/null +++ b/docs/compute-resources.md @@ -0,0 +1,236 @@ + + +# Compute Resources in Tekton + +## Background: Resource Requirements in Kubernetes + +Kubernetes allows users to specify CPU, memory, and ephemeral storage constraints +for [containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/). +Resource requests determine the resources reserved for a pod when it's scheduled, +and affect likelihood of pod eviction. Resource limits constrain the maximum amount of +a resource a container can use. A container that exceeds its memory limits will be killed, +and a container that exceeds its CPU limits will be throttled. + +A pod's [effective resource requests and limits](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources) +are the higher of: +- the sum of all app containers request/limit for a resource +- the effective init container request/limit for a resource + +This formula exists because Kubernetes runs init containers sequentially and app containers +in parallel. (There is no distinction made between app containers and sidecar containers +in Kubernetes; a sidecar is used in the following example to illustrate this.) + +For example, consider a pod with the following containers: + +| Container | CPU request | CPU limit | +| ------------------- | ----------- | --------- | +| init container 1 | 1 | 2 | +| init container 2 | 2 | 3 | +| app container 1 | 1 | 2 | +| app container 2 | 2 | 3 | +| sidecar container 1 | 3 | no limit | + +The sum of all app container CPU requests is 6 (including the sidecar container), which is +greater than the maximum init container CPU request (2). Therefore, the pod's effective CPU +request will be 6. + +Since the sidecar container has no CPU limit, this is treated as the highest CPU limit. +Therefore, the pod will have no effective CPU limit. + +## Task Resource Requirements + +Tekton allows users to specify resource requirements of [`Steps`](./tasks.md#defining-steps), +which run sequentially. However, the pod's effective resource requirements are still the +sum of its containers' resource requirements. This means that when specifying resource +requirements for `Step`containers, they must be treated as if they are running in parallel. + +Tekton adjusts `Step` resource requirements to comply with [LimitRanges](#limitrange-support). +[ResourceQuotas](#resourcequota-support) are not currently supported. + +## LimitRange Support + +Kubernetes allows users to configure [LimitRanges]((https://kubernetes.io/docs/concepts/policy/limit-range/)), +which constrain compute resources of pods, containers, or PVCs running in the same namespace. + +LimitRanges can: +- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace. +- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace. +- Enforce a ratio between request and limit for a resource in a namespace. +- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime. + +Tekton applies the resource requirements specified by users directly to the containers +in a `Task's` pod, unless there is a LimitRange present in the namespace. +(Tekton doesn't allow users to configure init containers for a `Task`.) +Tekton supports LimitRange minimum, maximum, and default resource requirements for containers, +but does not support LimitRange ratios between requests and limits ([#4230](https://github.com/tektoncd/pipeline/issues/4230)). +LimitRange types other than "Container" are not supported. + +### Requests + +If a `Step` does not have requests defined, the resulting container's requests are the larger of: +- the LimitRange minimum resource requests +- the LimitRange default resource requests, divided among the app containers + +If a `Step` has requests defined, the resulting container's requests are the larger of: +- the `Step's` requests +- the LimitRange minimum resource requests + +### Limits + +If a `Step` does not have limits defined, the resulting container's limits are the smaller of: +- the LimitRange maximum resource limits +- the LimitRange default resource limits + +If a `Step` has limits defined, the resulting container's limits are the smaller of: +- the `Step's` limits +- the LimitRange maximum resource limits + +### Examples + +Consider the following LimitRange: + +``` +apiVersion: v1 +kind: LimitRange +metadata: + name: limitrange-example +spec: + limits: + - default: # The default limits + cpu: 2 + defaultRequest: # The default requests + cpu: 1 + max: # The maximum limits + cpu: 3 + min: # The minimum requests + cpu: 300m + type: Container +``` + +A `Task` with 2 `Steps` and no resources specified would result in a pod with the following containers: + +| Container | CPU request | CPU limit | +| ------------ | ----------- | --------- | +| container 1 | 500m | 2 | +| container 2 | 500m | 2 | + +Here, the default CPU request was divided among app containers, and this value was used since it was greater +than the minimum request specified by the LimitRange. +The CPU limits are 2 for each container, as this is the default limit specifed in the LimitRange. + +Now, consider a `Task` with the following `Step`s: + +| Step | CPU request | CPU limit | +| ------ | ----------- | --------- | +| step 1 | 200m | 2 | +| step 2 | 1 | 4 | + +The resulting pod would have the following containers: + +| Container | CPU request | CPU limit | +| ------------ | ----------- | --------- | +| container 1 | 300m | 2 | +| container 2 | 1 | 3 | + +Here, the first `Step's` request was less than the LimitRange minimum, so the output request is the minimum (300m). +The second `Step's` request is unchanged. The first `Step's` limit is less than the maximum, so it is unchanged, +while the second `Step's` limit is greater than the maximum, so the maximum (3) is used. + +### Support for multiple LimitRanges + +Tekton supports running `TaskRuns` in namespaces with multiple LimitRanges. +For a given resource, the minumum used will be the largest of any of the LimitRanges' minimum values, +and the maximum used will be the smallest of any of the LimitRanges' maximum values. + +The minimum resource requirement used will be the largest of any minimum for that resource, +and the maximum resource requirement will be the smallest of any of the maximum values defined. +The default value will be the minimum of any default values defined. +If the resulting default value is less than the resulting minimum value, the default value will be the minimum value. + +It's possible for multiple LimitRanges to be defined which are not compatible with each other, preventing pods from being scheduled. + +#### Example + +Consider a namespaces with the following LimitRanges defined: + +``` +apiVersion: v1 +kind: LimitRange +metadata: + name: limitrange-1 +spec: + limits: + - default: # The default limits + cpu: 2 + defaultRequest: # The default requests + cpu: 750m + max: # The maximum limits + cpu: 3 + min: # The minimum requests + cpu: 500m + type: Container +``` + +``` +apiVersion: v1 +kind: LimitRange +metadata: + name: limitrange-2 +spec: + limits: + - default: # The default limits + cpu: 1.5 + defaultRequest: # The default requests + cpu: 1 + max: # The maximum limits + cpu: 2.5 + min: # The minimum requests + cpu: 300m + type: Container +``` + +A namespace with limitrange-1 and limitrange-2 would be treated as if it contained only the following LimitRange: + +``` +apiVersion: v1 +kind: LimitRange +metadata: + name: aggregate-limitrange +spec: + limits: + - default: # The default limits + cpu: 1.5 + defaultRequest: # The default requests + cpu: 750m + max: # The maximum limits + cpu: 2.5 + min: # The minimum requests + cpu: 300m + type: Container +``` + +Here, the minimum of the "max" values is the output "max" value, and likewise for "default" and "defaultRequest". +The maximum of the "min" values is the output "min" value. + +## ResourceQuota Support + +Kubernetes allows users to define [ResourceQuotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/), +which restrict the maximum resource requests and limits of all pods running in a namespace. +`TaskRuns` can't currently be created in a namespace with ResourceQuotas +([#2933](https://github.com/tektoncd/pipeline/issues/2933)). + +# References + +- [LimitRange in k8s docs](https://kubernetes.io/docs/concepts/policy/limit-range/) +- [Configure default memory requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/) +- [Configure default CPU requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/) +- [Configure Minimum and Maximum CPU constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/) +- [Configure Minimum and Maximum Memory constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/) +- [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) +- [Kubernetes best practices: Resource requests and limits](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits) +- [Restrict resource consumption with limit ranges](https://docs.openshift.com/container-platform/4.8/nodes/clusters/nodes-cluster-limit-ranges.html) diff --git a/docs/limitrange.md b/docs/limitrange.md deleted file mode 100644 index a9bcfab5e9a..00000000000 --- a/docs/limitrange.md +++ /dev/null @@ -1,105 +0,0 @@ - - -# `LimitRange` support in Pipeline - -## `LimitRange`s, `Requests` and `Limits` - -Taken from the [LimitRange in kubernetes docs](https://kubernetes.io/docs/concepts/policy/limit-range/). - -By default, containers run with unbounded [compute resources](/docs/concepts/configuration/manage-resources-containers/) on a Kubernetes cluster. -With resource quotas, cluster administrators can restrict resource consumption and creation on a `namespace` basis. -Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all available resources. A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace. - -A _LimitRange_ provides constraints that can: - -- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace. -- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace. -- Enforce a ratio between request and limit for a resource in a namespace. -- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime. - -`LimitRange` are validating and mutating `Requests` and `Limits`. Let's look, *in a nutshell*, on how those work in Kubernetes. - -- **Requests** are not enforced. If the node has more resource available than the request, the container can use it. -- **Limits** on the other hand, are a hard stop. A container going over the limit, will be killed. - -Resource types for both are: -- CPU -- Memory -- Ephemeral storage - -The next question is : how pods with resource and limits are run/scheduled ? -The scheduler *computes* the amount of CPU and memory requests (using **Requests**) and tries to find a node to schedule it. - -A pod's [effective resource requests and limits](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources) are the higher of: -- the sum of all app containers request/limit for a resource -- the effective init request/limit for a resource - -For example, if you have the following requests: -- initContainer1 : 1 CPU, 100Mi memory -- initContainer2 : 2 CPU, 200Mi memory -- container1 : 1 CPU, 50Mi memory -- container2 : 2 CPU, 250Mi memory -- container3 : 3 CPU, 500Mi memory - -The computation will be: -- CPU : the max init container CPU is 2, and the sum of the container CPUs is 6. The resulting pod will use 6 CPUs, the max of the init container and container CPU values. -- Memory: the max init container memory is 200Mi, and the sum of the container memory requests is 800Mi. - The resulting pod will use 800Mi, the max of the init container and container memory values. - -## Tekton support - -The way Limits and Requests works in Kubernetes is because it is assumed that all containers run in parallel, and init container run before, each one after the others. - -That assumption — containers running in parallel — is not true in Tekton. They do all start together (because there is no way around this) **but** the *[entrypoint](https://github.com/tektoncd/pipeline/tree/main/cmd/entrypoint#entrypoint) hack* is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time. - -This means, we need to handle limits, request and LimitRanges in a *non-standard* way. Since the existing LimitRange mutating webhook won't take Tekton's requirements into account, Tekton needs to fully control all the values the LimitRange webhook might set. Let's try to define that. Tekton needs to take into account all the aspect of the LimitRange : the min/max as well as the default. If there is no default, but there is min/max, Tekton need then to **set** a default value that is between the min/max. If we set the value too low, the Pod won't be able to be created, similar if we set the value too high. **But** those values are set on **containers**, so we **have to** do our own computation to know what request to put on each containers. - - -## A LimitRange is in the namespace - -We need to get the default (limits), default requests, min and max values (if they are here). - -Here are the rules for container's resources (requests and limits) computations: -- **init containers:** they won't be summed, so the rules are simple - - a container needs to have request and limits at least at the `min` and set to the `default` if any. - - *use the default requests and the default limits (coming from the defaultLimit, or the min, …)* -- **containers:** those will be summed at the end, so it gets a bit complex - - a container needs to have request and limits at least at the `min` - - the sum of the container request/limits **should be** as small as possible. This should be - ensured by using the "smallest" possible request on it. - -One thing to note is that, in the case of a LimitRange being present, we need to **not rely** on the pod mutation webhook that takes the default into account ; what this means is, we need to specify all request and limits ourselves so that the mutation webhook doesn't have any work to do. - -- **No default value:** if there is no default value, we need to treat the min as the - default. I think that's also what k8s does, at least in our computation. -- **Default value:** we need to "try" to respect that as much as possible. - - `defaultLimit` but no `defaultRequest`, then we set `defaultRequest` to be same as `min` (if present). - - `defaultRequest` but no `defaultlimit`, then we use the `max` limit as the `defaultLimit` - - no `defaultLimit`, no `defaultRequest`, then we use the `min` as `defaultRequest` and the `max` as `defaultLimit`. - -## Multiple LimitRange are in the namespace - -Similar to on LimitRange, except we need to act as if it was one LimitRange (virtual) with -the correct value from each of them. - -- Take the maximum of the min values -- Take the minimum of the max values -- Take the default request that fits into the previous 2 min/max - -Once we have this "virtual" LimitRange, we can act as there was one `LimitRange`. Note that it is possible to define multiple `LimitRange` that would go conflict with each other and block any `Pod` scheduling. Tekton Pipeline will not do anything to try to go around this as it is a behaviour of Kubernetes itself. - -# References - -- [LimitRange in k8s docs](https://kubernetes.io/docs/concepts/policy/limit-range/) -- [Configure default memory requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/) -- [Configure default CPU requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/) -- [Configure Minimum and Maximum CPU constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/) -- [Configure Minimum and Maximum Memory constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/) -- [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) -- [Kubernetes best practices: Resource requests and limits](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits) -- [Restrict resource consumption with limit ranges](https://docs.openshift.com/container-platform/4.8/nodes/clusters/nodes-cluster-limit-ranges.html) diff --git a/docs/pipelineruns.md b/docs/pipelineruns.md index 34865e51315..8c505f64bef 100644 --- a/docs/pipelineruns.md +++ b/docs/pipelineruns.md @@ -506,7 +506,7 @@ time from the invoked `Task`, Tekton will request the compute values for CPU, me storage for each `Step` based on the [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/) object(s), if present. Any `Request` or `Limit` specified by the user (on `Task` for example) will be left unchanged. -For more information, see the [`LimitRange` support in Pipeline](./limitrange.md). +For more information, see the [`LimitRange` support in Pipeline](./compute-resources.md#limitrange-support). ### Configuring a failure timeout diff --git a/docs/taskruns.md b/docs/taskruns.md index e9200622597..401aad2055f 100644 --- a/docs/taskruns.md +++ b/docs/taskruns.md @@ -469,7 +469,7 @@ time from the invoked `Task`, Tekton will requests the compute values for CPU, m storage for each `Step` based on the [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/) object(s), if present. Any `Request` or `Limit` specified by the user (on `Task` for example) will be left unchanged. -For more information, see the [`LimitRange` support in Pipeline](./limitrange.md). +For more information, see the [`LimitRange` support in Pipeline](./compute-resources.md#limitrange-support). ### Configuring the failure timeout diff --git a/docs/tasks.md b/docs/tasks.md index 1e9ee168e1e..6c8e0d62120 100644 --- a/docs/tasks.md +++ b/docs/tasks.md @@ -165,7 +165,10 @@ The following requirements apply to each container image referenced in a `steps` - Each container image runs to completion or until the first failure occurs. - The CPU, memory, and ephemeral storage resource requests set on `Step`s will be adjusted to comply with any [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/)s - present in the `Namespace`. For more detail, see [LimitRange support in Pipeline](./limitrange.md). + present in the `Namespace`. In addition, Kubernetes determines a pod's effective resource + requests and limits by summing the requests and limits for all its containers, even + though Tekton runs `Steps` sequentially. + For more detail, see [Compute Resources in Tekton](./compute-resources.md). Below is an example of setting the resource requests and limits for a step: