Skip to content

Commit

Permalink
PodDisruptionBudget documentation Improvements (#4140)
Browse files Browse the repository at this point in the history
* Changes from #3885

Title: Update PDB documentation to explain new field
Author: foxish

* Added Placeholder Disruptions Concept Guide

New file: docs/concepts/workloads/pods/disruptions.md
Intented contents: concept for Pod Disruption Budget,
 cross reference to Eviction and Preemption docs.
Linked from: concepts > workloads > pods

* Added placeholder Configuring PDB Task

New file: docs/tasks/run-application/configure-pdb.md
Intented contents: task for writing a Pod Disruption Budget.
Linked from: tasks > configuring-applications > configure pdb.

* Add refs to the "drain a node" task.

* Refactor PDB docs.

Move the "Requesting an eviction" section from:
docs/tasks/administer-cluster/configure-pod-disruption-budget.md
-- which is going away -- to:
docs/tasks/administer-cluster/safely-drain-node.md

The move is verbatim, except for an introductory sentence.

Also added assignees.

* Refactor of PDB docs

Moved the section:
Specifying a PodDisruptionBudget
from:
docs/tasks/administer-cluster/configure-pod-disruption-budget.md
to:
docs/tasks/run-application/configure-pdb.md
because that former file is going away.
Move is verbatim.

* Explain how Eviction tools should handle failures

* Refactor PDB docs

Move text from:
docs/tasks/administer-cluster/configure-pod-disruption-budget.md
to:
docs/concepts/workloads/pods/disruptions.md

Delete the now empty:
docs/tasks/administer-cluster/configure-pod-disruption-budget.md

Added a redirects_from section to the new doc, containing the path
of the now-deleted doc, plus all the redirects from the deleted
doc.

* Expand PDB Concept guide

Building on a little content from the old task,
greatly expanded the Disruptions concept
guide, including an abstract example.

* Update creating a pdb Task.

* Address review comments.

* Fixed for all cody-clark's review comments

* Address review comments from mml

* Address review comments from maisem

* Fix missing backtick
  • Loading branch information
erictune authored and Jessica Yao committed Sep 22, 2017
1 parent f7aaa27 commit af19df2
Show file tree
Hide file tree
Showing 6 changed files with 578 additions and 172 deletions.
1 change: 1 addition & 0 deletions _data/concepts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ toc:
- docs/concepts/workloads/pods/pod.md
- docs/concepts/workloads/pods/pod-lifecycle.md
- docs/concepts/workloads/pods/init-containers.md
- docs/concepts/workloads/pods/disruptions.md
- title: Controllers
section:
- docs/concepts/workloads/controllers/replicaset.md
Expand Down
1 change: 1 addition & 0 deletions _data/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ toc:
- docs/tasks/run-application/rolling-update-replication-controller.md
- docs/tasks/run-application/horizontal-pod-autoscale.md
- docs/tasks/run-application/horizontal-pod-autoscale-walkthrough.md
- docs/tasks/run-application/configure-pdb.md

- title: Run Jobs
section:
Expand Down
277 changes: 277 additions & 0 deletions docs/concepts/workloads/pods/disruptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
---
assignees:
- erictune
- foxish
- davidopp
title: Disruptions
redirect_from:
- "/docs/admin/disruptions/"
- "/docs/admin/disruptions.html"
- "/docs/tasks/configure-pod-container/configure-pod-disruption-budget/"
- "/docs/tasks/configure-pod-container/configure-pod-disruption-budget/"
- "/docs/tasks/administer-cluster/configure-pod-disruption-budget/"
---

{% capture overview %}
This guide is for application owners who want to build
highly availabile applications, and thus need to understand
what types of Disruptions can happen to Pods.

It is also for Cluster Administrators who want to perform automated
cluster actions, like upgrading and autoscaling clusters.

{% endcapture %}

{:toc}

{% capture body %}

## Voluntary and Involuntary Disruptions

Pods do not disappear until someone (a person or a controller) destroys them, or
there is an unavoidable hardware or system software error.

We call these unavoidable cases *involuntary disruptions* to
an applicaton. Examples are:

- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM dissappear
- a kernel panic
- if the node to disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being [out-of-resources](/docs/tasks/administer-cluster/out-of-resource.md).

Except for the out-of-resources condition, all these conditions
should be familiar to most users; they are are not specific
to Kubernetes.

We call other cases *voluntary disruptions*. These include both
actions initiated by the application owner and those initiated by a Cluster
Administrator. Typical application owner actions include:

- deleting the deployment or other controller that manages the pod
- updating a deployment's pod template causing a restart
- directly deleting a pod (e.g. by accident)

Cluster Administrator actions include:

- [Draining a node](/docs//tasks/administer-cluster/safely-drain-node.md) for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn about
[Cluster Autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaler)
).
- Removing a pod from a node to permit something else to fit on that node.

These actions might be taken directly by the cluster administrator, or by automation
run by the cluster administrator, or by your cluster hosting provider.

Ask your cluster administrator or consult your cloud provider or distribution documentation
to determine if any sources of voluntary disruptions are enabled for your cluster.
If none are enabled, you can skip creating Pod Disruption Budgets.

## Dealing with Disruptions

Here are some ways to mitigate involuntary disruptions:

- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container) it needs.
- Replicate your application if you need higher availability. (Learn about running replicated
[stateless](/docs/tasks/run-application/run-stateless-application-deployment.md)
and [stateful](/docs/tasks/run-application/run-replicated-stateful-application.md) applications.)
- For even higher availability when running replicated applications,
spread applications across racks (using
[anti-affinity](/docs/user-guide/node-selection/#inter-pod-affinity-and-anti-affinity-beta-feature))
or across zones (if using a
[multi-zone cluster](/docs/admin/multiple-zones).)

The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are
no voluntary disruptions at all. However, your cluster admnistrator or hosting provider
may run some additional services which cause voluntary disruptions. For example,
rolling out node software updates can cause voluntary updates. Also, some implementations
of cluster (node) autoscaling may cause voluntary disruptions to defragment and compact nodes.
You cluster adminstrator or hosting provider should have documented what level of voluntary
disruptions, if any, to expect.

Kubernetes offers features to help run highly available applications at the same
time as frequent voluntary disruptions. We call this set of features
*Disruption Budgets*.


## How Disruption Budgets Work

An Application Owner can create a `PodDisruptionBudget` object (PDB) for each application.
A PDB limits the number pods of a replicated application that are down simultaneously from
voluntary disruptions. For example, a quorum-based application would
like to ensure that the number of replicas running is never brought below the
number needed for a quorum. A web front end might want to
ensure that the number of replicas serving load never falls below a certain
percentage of the total.

Cluster managers and hosting providers should use tools which
respect Pod Disruption Budgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#the-eviction-api)
instead of directly deleting pods. Examples are the `kubectl drain` command
and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).

When a cluster administrator wants to drain a node
they use the `kubectl drain` command. That tool tries to evict all
the pods on the machine. The eviction request may be temporarily rejected,
and the tool periodically retries all failed requests until all pods
are terminated, or until a configureable timeout is reached.

A PDB specifies the number of replicas that an application can tolerate having, relative to how
many it is intended to have. For example, a Deployment which has a `spec.replicas: 5` is
supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,
then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.

The group of pods that comprise the application is specified using a label selector, the same
as the one used by the application's controller (deployment, stateful-set, etc).

The "intended" number of pods is computed from the `.spec.replicas` of the pods controller.
The controller is discovered from the pods using the `.metadata.ownerReferences` of the object.

PDBs cannot prevent [involuntary disruptions](#voluntary-and-involuntary-disruptions) from
occuring, but they do count against the budget.

Pods which are deleted or unavailable due to a rolling upgrade to an application do count
against the disruption budget, but controllers (like deployment and stateful-set)
are not limited by PDBs when doing rolling upgrades -- the handling of failures
during application updates is configured in the controller spec.
(Learn about [updating a deployment](/docs/concepts/cluster-administration/manage-deployment/#updating-your-application-without-a-service-outage).)

When a pod is evicted using the eviction API, it is gracefully terminated (see
`terminationGracePeriodSeconds` in [PodSpec](/docs/resources-reference/v1.6/#podspec-v1-core).)

## PDB Example

Consider a cluster with 3 nodes, `node-1` through `node-3`.
The cluster is running several applications. One of them has 3 replicas initially called
`pod-a`, `pod-b`, and `pod-c`. Another, unrelated pod without a PDB, called `pod-x`, is also shown.
Initially, the pods are layed out as follows:

| node-1 | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| pod-a *available* | pod-b *available* | pod-c *available* |
| pod-x *available* | | |

All 3 pods are part of an deployment, and they collectively have a PDB which requires
there be at least 2 of the 3 pods to be available at all times.

For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.
The cluster administrator first tries to drain `node-1` using the `kubectl drain` command.
That tool tries to evict `pod-a` and `pod-x`. This succeeds immediately.
Both pods go into the `terminating` state at the same time.
This puts the cluster in this state:

| node-1 *draining* | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| pod-a *terminating* | pod-b *available* | pod-c *available* |
| pod-x *terminating* | | |

The deployment notices that one of the pods is terminating, so it creates a replacement
called `pod-d`. Since `node-1` is cordoned, it lands on another node. Something has
also created `pod-y` as a replacement for `pod-x`.

(Note: for a StatefulSet, `pod-a`, which would be called something like `pod-1`, would need
to terminate completely before its replacement, which is also called `pod-1` but has a
different UID, could be created. Otherwise, the example applies to a StatefulSet as well.)

Now the cluster is in this state:

| node-1 *draining* | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| pod-a *terminating* | pod-b *available* | pod-c *available* |
| pod-x *terminating* | pod-d *starting* | pod-y |

At some point, the pods terminate, and the cluster look like this:

| node-1 *drained* | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| | pod-b *available* | pod-c *available* |
| | pod-d *starting* | pod-y |

At this point, if an impatient cluster administrator tries to drain `node-2` or
`node-3`, the drain command will block, because there are only 2 available
pods for the deployment, and its PDB requires at least 2. After some time

asses, `pod-d` becomes available.

The cluster state now looks like this:

| node-1 *drained* | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| | pod-b *available* | pod-c *available* |
| | pod-d *available* | pod-y |

Now, the cluster admin tries to drain `node-2`.
The drain command will try to evict the two pods in some order, say
`pod-b` first and then `pod-d`. It will succeed at evicting `pod-b`.
But, when it tries to evict `pod-d`, it will be refused because that would leave only
one pod available for the deployment.

The deployment creates a replacement for `pod-b` called `pod-e`.
However, not there are not enough resources in the cluster to schedule
`pod-e`. So, the drain then the drain will block. The cluster may end up in this
state:

| node-1 *drained* | node-2 | node-3 | *no node* |
|:--------------------:|:-------------------:|:------------------:|:------------------:|
| | pod-b *available* | pod-c *available* | pod-e *pending* |
| | pod-d *available* | pod-y | |

At this point, the cluster administrator needs to
add a node back to the cluster to proceed with the upgrade.

You can see how Kubernetes varies the rate at which disruptions
can happen, according to:

- how many replicas an application needs
- how long it takes to gracefully shutdown an instance
- how long it takes a new instance to start up
- the type of controller
- the cluster's resource capacity

## Separating Cluster Owner and Application Owner Roles

Often, it is useful to think of the Cluster Manager
and Application Owner as separate roles with limited knowlege
of each other. This separation of responsibilities
may make sense in these scenarios:

- when there are many application teams sharing a Kubernetes cluster, and
there is natural specialization of roles
- when third-party tools or services are used to automate cluster management

Pod Disrutption Budgets support this separation of roles by providing an
interface between the roles.

If you do not have such a separation of responsibilities in your organization,
you may not need to use Pod Disruption Budgets.

## How to perform Distruptive Actions your Cluster

If you are a Cluster Administrator, and you need to perform a disruptive action on all
the nodes in your cluster, such as a node or system software upgrade, here are some options:

- Accept downtime during the upgrade.
- Fail over to another complete replica cluster.
- No downtime, but may be costly both for the duplicated nodes,
and for human effort to orchestrate the switchover.
- Write disruption tolerant applications and use PDBs.
- No downtime.
- Minimal resource duplication.
- Allows more automation of cluster administration.
- Writing disruption-tolerant applications is tricky, but the work to tolerate voluntary
disruptions largely overlaps with work to support autoscaling and tolerating
involuntary disruptions.

{% endcapture %}


{% capture whatsnext %}

* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application//configure-pdb.md).

* Learn more about [draining nodes](/docs/tasks/administer-cluster//safely-drain-node.md)

{% endcapture %}


{% include templates/concept.md %}
Loading

0 comments on commit af19df2

Please sign in to comment.