Skip to content

Commit

Permalink
Fixed for all cody-clark's review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
erictune committed Jun 23, 2017
1 parent 512c2c8 commit 992a4a1
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 87 deletions.
128 changes: 71 additions & 57 deletions docs/concepts/workloads/pods/disruptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ highly availabile applications, and thus need to understand
what types of Disruptions can happen to Pods.

It is also for Cluster Administrators who want to perform automated
cluster actions, like upgrades and cluster autoscaling.
cluster actions, like upgrading and autoscaling clusters.

{% endcapture %}

Expand All @@ -28,44 +28,56 @@ cluster actions, like upgrades and cluster autoscaling.

## Voluntary and Involuntary Disruptions

Pods generally do not dissappear until someone (a person or the controller) destroys them.
However, once bound to a particular node, it is bound to that node for the rest of its lifetime.
If a node dies or is disconnected, the pod is terminated. Kubernetes controllers automatically
create replacement pods when this happens.
(Read more about [pod lifetime](/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime).)

Some node failures are unavoidable. We call these *involuntary disruptions* to
an applicaton; for example, a hardware failure, kernel panic may cause
the node to disappear from the cluster, taking its Pods with it. Other examples
are a node that is `NotReady`, a cluster network partition, or an eviction of a pod
from a node due to the node being [out-of-resources](/docs/tasks/administer-cluster/out-of-resource.md).

However, sometimes cluster management operations need to terminate pods.
We say these are *voluntary disruptions* since they can be safely delayed for a reasonable period
of time. Examples are draining a node for maintenance or upgrade (learn how to
[safely drain a node](/docs//tasks/administer-cluster/safely-drain-node.md)
) and removing nodes from a cluster to scale it down (learn about
Pods do not disappear until someone (a person or a controller) destroys them, or
there is an unavoidable hardware or system software error.

We call these unavoidable cases *involuntary disruptions* to
an applicaton. Examples are:

- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM dissappear
- a kernel panic
- if the node to disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being [out-of-resources](/docs/tasks/administer-cluster/out-of-resource.md).

Except for the out-of-resources condition, all these conditions
should be familiar to most users; they are are not specific
to Kubernetes.

We call other cases *voluntary disruptions*. These include both
actions initiated by the application owner and those initiated by a Cluster
Administrator. Typical application owner actions include:

- deleting the deployment or other controller that manages the pod
- updating a deployment's pod template causing a restart
- directly deleting a pod (e.g. by accident)

Cluster Administrator actions include:

- [Draining a node](/docs//tasks/administer-cluster/safely-drain-node.md) for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn about
[Cluster Autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaler)
). In future releases of Kubernetes, a
[rescheduler](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduling.md)
may also perform voluntary evictions.
).
- Removing a pod from a node to permit something else to fit on that node.

These actions might be taken directly by the cluster administrator, or by automation
run by the cluster administrator, or by your cluster hosting provider.

All sources of voluntary disruptions are optional with Kubernetes.
Ask you cluster administrator or consult your cloud provider or distribution documentation
to determine if any are configured for your cluster. If none are configured, you can skip
creating Pod Disruption Budgets.
Ask your cluster administrator or consult your cloud provider or distribution documentation
to determine if any sources of voluntary disruptions are enabled for your cluster.
If none are enabled, you can skip creating Pod Disruption Budgets.

## Dealing with Disruptions

Involuntary disruptions are typically infrequent,
and so it is often sufficient to just accept them.
Here are some ways to mitigate involuntary disruptions:

If higher availability is needed,
then the application can be replicated. (Learn about running replicated
- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container) it needs.
- Replicate your application if you need higher availability. (Learn about running replicated
[stateless](/docs/tasks/run-application/run-stateless-application-deployment.md)
and [stateful](/docs/tasks/run-application/run-replicated-stateful-application.md) applications.)
Simultaneous failures of multiple nodes is less likely. For even higher availability, use
multi-zone clusters, spread applications across racks (using
- For even higher availability when running replicated applications,
spread applications across racks (using
[anti-affinity](/docs/user-guide/node-selection/#inter-pod-affinity-and-anti-affinity-beta-feature))
or across zones (if using a
[multi-zone cluster](/docs/admin/multiple-zones).)
Expand All @@ -78,7 +90,6 @@ of cluster (node) autoscaling may cause voluntary disruptions to defragment and
You cluster adminstrator or hosting provider should have documented what level of voluntary
disruptions, if any, to expect.


Kubernetes offers features to help run highly available applications at the same
time as frequent voluntary disruptions. We call this set of features
*Disruption Budgets*.
Expand All @@ -87,19 +98,22 @@ time as frequent voluntary disruptions. We call this set of features
## How Disruption Budgets Work

An Application Owner can create a `PodDisruptionBudget` object (PDB) for each application.
A PDB limits the number pods of a replicated application that are down simultaneously due
to voluntary disruptions. For example, a quorum-based application would
A PDB limits the number pods of a replicated application that are down simultaneously from
voluntary disruptions. For example, a quorum-based application would
like to ensure that the number of replicas running is never brought below the
number needed for a quorum, even temporarily. A web front end might want to
number needed for a quorum. A web front end might want to
ensure that the number of replicas serving load never falls below a certain
percentage of the total, even briefly.
percentage of the total.

Cluster management tools can use the `Eviction API` to "safely delete" pods
while respecting Pod Disruption Budgets.
When a tool attempts to delete a pod using the Eviction API, Kuberetes checks if the proposed
delete would leave the application with enough replicas. If not, it rejects the request.
The tool retries again later, and will typically succeed later.
The `kubectl drain` tool uses the Eviction API.
Cluster managers and hosting providers should use tools which
respect Pod Disruption Budgets by calling the Eviction API instead
of directly deleting pods. An example is the `kubectl drain` command.

When a cluster administrator wants to drain a node
they use the `kubectl drain` command. That tool tries to evict all
the pods on the machine. The eviction request may be temporarily rejected,
and the tool periodically retries all failed requests until all pods
are terminated, or until a configureable timeout is reached.

A PDB specifies the number of replicas that an application can tolerate having, relative to how
many it is intended to have. For example, a Deployment which has a `spec.replicas: 5` is
Expand Down Expand Up @@ -137,9 +151,9 @@ Initially, the pods are layed out as follows:
| pod-x *available* | | |

All 3 pods are part of an deployment, and they collectively have a PDB which requires
that there be at least 2 of the 3 pods available at all times.
there be at least 2 of the 3 pods to be available at all times.

The cluster administrator want to reboot into a new kernel version to fix a bug in the kernel.
For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.
The cluster administrator first tries to drain `node-1` using the `kubectl drain` command.
That tool tries to evict `pod-a` and `pod-x`. This succeeds immediately.
Both pods go into the `terminating` state at the same time.
Expand All @@ -150,7 +164,7 @@ This puts the cluster in this state:
| pod-a *terminating* | pod-b *available* | pod-c *available* |
| pod-x *terminating* | | |

The deployment notices that one of the pods is terminating, so it creates a replacement,
The deployment notices that one of the pods is terminating, so it creates a replacement
called `pod-d`. Since `node-1` is cordoned, it lands on another node. Something has
also created `pod-y` as a replacement for `pod-x`.

Expand All @@ -173,8 +187,10 @@ At some point, the pods terminate, and the cluster look like this:
| | pod-d *starting* | pod-y |

At this point, if an impatient cluster administrator tries to drain `node-2` or
`node-3`, they will not succeed, because there are only 2 available pods for the deployment,
and its PDB requires at least 2. After some time passes, `pod-d` becomes available.
`node-3`, the drain command will block, because there are only 2 available
pods for the deployment, and its PDB requires at least 2. After some time

asses, `pod-d` becomes available.

The cluster state now looks like this:

Expand Down Expand Up @@ -202,8 +218,8 @@ state:
At this point, the cluster administrator needs to
add a node back to the cluster to proceed with the upgrade.

You can see how Kubernetes determine the rate at which disruptions, like node upgrades,
can happen, varying the rate as the operation progresses, according to:
You can see how Kubernetes varies the rate at which disruptions
can happen, according to:

- how many replicas an application needs
- how long it takes to gracefully shutdown an instance
Expand All @@ -216,30 +232,28 @@ can happen, varying the rate as the operation progresses, according to:
Often, it is useful to think of the Cluster Manager
and Application Owner as separate roles with limited knowlege
of each other. This separation of responsibilities
may make sense when:
may make sense in these scenarios:

- when there are many application teams sharing a Kubernetes cluster, and
there is natural specialization of roles
- when third-party tools or services are used to automate cluster management

Pod Disrutption Budgets supports this separation of roles by providing an
Pod Disrutption Budgets support this separation of roles by providing an
interface between the roles.

If you do not have such a separation of responsibilities in your organization,
you may not need to use Pod Disruption Budgets. In that case, when performing
cluster management tasks, like upgrading node software, and so on, you will
think the impact on your application at each step.
you may not need to use Pod Disruption Budgets.

## How to perform Distruptive Actions your Cluster

If you are a Cluster Administrator, and you need to perform a disruptive action on all
the nodes in your cluster, such as a node or system software upgrade, here are some options:

1. Accept downtime during the upgrade.
2. Fail over to another complete replica cluster.
- Accept downtime during the upgrade.
- Fail over to another complete replica cluster.
- No downtime, but may be costly both for the duplicated nodes,
and for human effort to orchestrate the switchover.
3. Write disruption tolerant applications and use PDBs.
- Write disruption tolerant applications and use PDBs.
- No downtime.
- Minimal resource duplication.
- Allows more automation of cluster administration.
Expand Down
34 changes: 17 additions & 17 deletions docs/tasks/administer-cluster/safely-drain-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,35 +120,35 @@ $ curl -v -H 'Content-type: application/json' http://127.0.0.1:8080/api/v1/names

The API can respond in one of three ways.

1. If the eviction is granted, then the pod is deleted just as if you had sent
a `DELETE` request to the pod's URL and you get back `200 OK`.
2. If the current state of affairs wouldn't allow an eviction by the rules set
forth in the budget, you get back `429 Too Many Requests`. This is
typically used for generic rate limiting of *any* requests, but here we mean
that this request isn't allowed *right now* but it may be allowed later.
Currently, callers do not get any `Retry-After` advice, but they may in
future versions.
3. If there is some kind of misconfiguration, like multiple budgets pointing at
the same pod, you will get `500 Internal Server Error`.
- If the eviction is granted, then the pod is deleted just as if you had sent
a `DELETE` request to the pod's URL and you get back `200 OK`.
- If the current state of affairs wouldn't allow an eviction by the rules set
forth in the budget, you get back `429 Too Many Requests`. This is
typically used for generic rate limiting of *any* requests, but here we mean
that this request isn't allowed *right now* but it may be allowed later.
Currently, callers do not get any `Retry-After` advice, but they may in
future versions.
- If there is some kind of misconfiguration, like multiple budgets pointing at
the same pod, you will get `500 Internal Server Error`.

For a given eviction request, there are two cases.

1. There is no budget that matches this pod. In this case, the server always
returns `200 OK`.
2. There is at least one budget. In this case, any of the three above responses may
apply.
- There is no budget that matches this pod. In this case, the server always
returns `200 OK`.
- There is at least one budget. In this case, any of the three above responses may
apply.

In some cases, an application may reach a broken state where it will never return anything
other than 429 or 500. This can happen, for example, if the replacement pod created by the
application's controller do not become ready, or if the last pod evicted has a very long
application's controller does not become ready, or if the last pod evicted has a very long
termination grace period.

In this case, either of the following could be done:
In this case, there are two potential solutions:

- Abort or pause the automated operation. Investigate the reason for the stuck application, and restart the automation.
- After a suitably long wait, `DELETE` the pod instead of using the eviction API.

Kubernetes does not specify what the behavior should be in this case: it is up to the
Kubernetes does not specify what the behavior should be in this case; it is up to the
application owners and cluster owners to establish an agreement on behavior in these cases.

{% endcapture %}
Expand Down
28 changes: 15 additions & 13 deletions docs/tasks/run-application/configure-pdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Specifying a Disruption Budget for your Application

{% capture overview %}

This page shows how to limit the number of concurrent disruption
This page shows how to limit the number of concurrent disruptions
that your application experiences, allowing for higher availability
while permitting the cluster administrator to manage the clusters
nodes.
Expand Down Expand Up @@ -43,10 +43,10 @@ PDBs can be used with applications the following types of application controller
- ReplicaSet
- StatefulSet

In future, support may be extended to user-provided controllers ("operators").
In the future, support may be extended to user-provided controllers ("operators").

Identify the pod selector used by your controller. You will use this in the
`PodDisruptionBudget` to select the pods protected by the PDB. These
"Identify the pod selector your controller uses. The PodDisruptionBudget uses this to select
the pods protected by the PDB. These
selectors should match. If you specify selectors that overlap or do not
cover all your pods, then they may not be protected by a disruption budget at all.

Expand All @@ -61,8 +61,10 @@ due to a voluntary disruption.
- Single-instance Stateful Application
- Concern: do not terminate this application without talking to me.
- Possible Solution 1: Do not use a PDB and tolerate occasional downtime.
- Possible Solution 2: Set PDB with maxUnavailable=0. Have understanding (outside of Kubernetes) that the cluster operator needs to consult before termination.
when the cluster operator contacts you, prepare for downtime, and then delete the to set maxUnavailable=1 to indicate readiness for disruption. Recreate afterwards.
- Possible Solution 2: Set PDB with maxUnavailable=0. Have an understanding
(outside of Kubernetes) that the cluster operator needs to consult you before
termination. When the cluster operator contacts you, prepare for downtime,
and then delete the PDB to indicate readiness for disruption. Recreate afterwards.
- Multiple-instance Stateful application such as Consul, ZooKeeper, or etcd
- Concern: Do not reduce number of instances below quorum, otherwise writes fail.
- Possible Solution 1: set maxUnavailable to 1 (works with varying scale of application).
Expand All @@ -76,30 +78,30 @@ due to a voluntary disruption.
A `PodDisruptionBudget` has three fields:

* A label selector `.spec.selector` to specify the set of
pods to which it applies. This is a required field.
pods to which it applies. This field is required.
* `.spec.minAvailable` which is a description of the number of pods from that
set that must still be available after the eviction, i.e. even in the absence
set that must still be available after the eviction, even in the absence
of the evicted pod. `minAvailable` can be either an absolute number or a percentage.
* `.spec.maxUnavailable` (available in Kubernetes 1.7 and higher) which is a description
of the number of pods from that set that can be unavailable after the eviction.
It can also be either an absolute number or a percentage.
It can be either an absolute number or a percentage.

You can specify only one of `maxUnavailable` and `minAvailable` in a single `PodDisruptionBudget`.
`maxUnavailable` can only be used to control the eviction of pods
that have an associated controller managing them. In the examples below, "desired replicas"
is the `scale` of the controller managing the pods being selected by the
`PodDisruptionBudget`.

Example 1: With a `minAvailable` of 5, evictions will be allowed as long as they leave behind
Example 1: With a `minAvailable` of 5, evictions are be allowed as long as they leave behind
5 or more healthy pods among those selected by the PodDisruptionBudget's `selector`.

Example 2: With a `minAvailable` of 30%, evictions will be allowed as long as at least 30%
Example 2: With a `minAvailable` of 30%, evictions are allowed as long as at least 30%
of the number of desired replicas are healthy.

Example 3: With a `maxUnavailable` of 5, evictions will be allowed as long as there are at most 5
Example 3: With a `maxUnavailable` of 5, evictions are allowed as long as there are at most 5
unhealthy replicas among the total number of desired replicas.

Example 4: With a `maxUnavailable` of 30%, evictions will be allowed as long as no more than 30%
Example 4: With a `maxUnavailable` of 30%, evictions are allowed as long as no more than 30%
of the desired replicas are unhealthy.

In typical usage, a single budget would be used for a collection of pods managed by
Expand Down

0 comments on commit 992a4a1

Please sign in to comment.