kubernetes · chenopis · Jun 26, 2017 · Jun 19, 2017 · Jun 19, 2017 · Jun 19, 2017
diff --git a/_data/concepts.yml b/_data/concepts.yml
@@ -35,6 +35,7 @@ toc:
     - docs/concepts/workloads/pods/pod.md
     - docs/concepts/workloads/pods/pod-lifecycle.md
     - docs/concepts/workloads/pods/init-containers.md
+    - docs/concepts/workloads/pods/disruptions.md
   - title: Controllers
     section:
     - docs/concepts/workloads/controllers/replicaset.md

diff --git a/_data/tasks.yml b/_data/tasks.yml
@@ -49,6 +49,7 @@ toc:
   - docs/tasks/run-application/rolling-update-replication-controller.md
   - docs/tasks/run-application/horizontal-pod-autoscale.md
   - docs/tasks/run-application/horizontal-pod-autoscale-walkthrough.md
+  - docs/tasks/run-application/configure-pdb.md
 
 - title: Run Jobs
   section:

diff --git a/docs/concepts/workloads/pods/disruptions.md b/docs/concepts/workloads/pods/disruptions.md
@@ -0,0 +1,277 @@
+---
+assignees:
+- erictune
+- foxish
+- davidopp
+title: Disruptions
+redirect_from:
+- "/docs/admin/disruptions/"
+- "/docs/admin/disruptions.html"
+- "/docs/tasks/configure-pod-container/configure-pod-disruption-budget/"
+- "/docs/tasks/configure-pod-container/configure-pod-disruption-budget/"
+- "/docs/tasks/administer-cluster/configure-pod-disruption-budget/"
+---
+
+{% capture overview %}
+This guide is for application owners who want to build
+highly availabile applications, and thus need to understand
+what types of Disruptions can happen to Pods.
+
+It is also for Cluster Administrators who want to perform automated
+cluster actions, like upgrading and autoscaling clusters.
+
+{% endcapture %}
+
+{:toc}
+
+{% capture body %}
+
+## Voluntary and Involuntary Disruptions
+
+Pods do not disappear until someone (a person or a controller) destroys them, or
+there is an unavoidable hardware or system software error.
+
+We call these unavoidable cases *involuntary disruptions* to
+an applicaton.  Examples are:
+
+- a hardware failure of the physical machine backing the node
+- cluster administrator deletes VM (instance) by mistake
+- cloud provider or hypervisor failure makes VM dissappear
+- a kernel panic
+- if the node to disappears from the cluster due to cluster network partition
+- eviction of a pod due to the node being [out-of-resources](/docs/tasks/administer-cluster/out-of-resource.md).
+
+Except for the out-of-resources condition, all these conditions
+should be familiar to most users; they are are not specific
+to Kubernetes.
+
+We call other cases *voluntary disruptions*.  These include both
+actions initiated by the application owner and those initiated by a Cluster
+Administrator.  Typical application owner actions include:
+
+- deleting the deployment or other controller that manages the pod
+- updating a deployment's pod template causing a restart
+- directly deleting a pod (e.g. by accident)
+
+Cluster Administrator actions include:
+
+- [Draining a node](/docs//tasks/administer-cluster/safely-drain-node.md) for repair or upgrade.
+- Draining a node from a cluster to scale the cluster down (learn about
+[Cluster Autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaler)
+).
+- Removing a pod from a node to permit something else to fit on that node.
+
+These actions might be taken directly by the cluster administrator, or by automation
+run by the cluster administrator, or by your cluster hosting provider.
+
+Ask your cluster administrator or consult your cloud provider or distribution documentation
+to determine if any sources of voluntary disruptions are enabled for your cluster.
+If none are enabled, you can skip creating Pod Disruption Budgets.
+
+## Dealing with Disruptions
+
+Here are some ways to mitigate involuntary disruptions:
+
+- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container) it needs.
+- Replicate your application if you need higher availability.  (Learn about running replicated
+[stateless](/docs/tasks/run-application/run-stateless-application-deployment.md)
+and [stateful](/docs/tasks/run-application/run-replicated-stateful-application.md) applications.)
+- For even higher availability when running replicated applications,
+spread applications across racks (using
+[anti-affinity](/docs/user-guide/node-selection/#inter-pod-affinity-and-anti-affinity-beta-feature))
+or across zones (if using a
+[multi-zone cluster](/docs/admin/multiple-zones).)
+
+The frequency of voluntary disruptions varies.  On a basic Kubernetes cluster, there are
+no voluntary disruptions at all.  However, your cluster admnistrator or hosting provider
+may run some additional services which cause voluntary disruptions.  For example,
+rolling out node software updates can cause voluntary updates.  Also, some implementations
+of cluster (node) autoscaling may cause voluntary disruptions to defragment and compact nodes.
+You cluster adminstrator or hosting provider should have documented what level of voluntary
+disruptions, if any, to expect.
+
+Kubernetes offers features to help run highly available applications at the same
+time as frequent voluntary disruptions.  We call this set of features
+*Disruption Budgets*.
+
+
+## How Disruption Budgets Work
+
+An Application Owner can create a `PodDisruptionBudget` object (PDB) for each application.
+A PDB limits the number pods of a replicated application that are down simultaneously from
+voluntary disruptions.  For example, a quorum-based application would
+like to ensure that the number of replicas running is never brought below the
+number needed for a quorum. A web front end might want to
+ensure that the number of replicas serving load never falls below a certain
+percentage of the total.  
+
+Cluster managers and hosting providers should use tools which
+respect Pod Disruption Budgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#the-eviction-api)
+instead of directly deleting pods.  Examples are the `kubectl drain` command
+and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
+
+When a cluster administrator wants to drain a node
+they use the `kubectl drain` command.  That tool tries to evict all
+the pods on the machine.  The eviction request may be temporarily rejected,
+and the tool periodically retries all failed requests until all pods
+are terminated, or until a configureable timeout is reached.
+
+A PDB specifies the number of replicas that an application can tolerate having, relative to how
+many it is intended to have.  For example, a Deployment which has a `spec.replicas: 5` is
+supposed to have 5 pods at any given time.  If its PDB allows for there to be 4 at a time,
+then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.
+
+The group of pods that comprise the application is specified using a label selector, the same
+as the one used by the application's controller (deployment, stateful-set, etc).
+
+The "intended" number of pods is computed from the `.spec.replicas` of the pods controller.
+The controller is discovered from the pods using the `.metadata.ownerReferences` of the object.
+
+PDBs cannot prevent [involuntary disruptions](#voluntary-and-involuntary-disruptions) from
+occuring, but they do count against the budget.
+
+Pods which are deleted or unavailable due to a rolling upgrade to an application do count
+against the disruption budget, but controllers (like deployment and stateful-set)
+are not limited by PDBs when doing rolling upgrades -- the handling of failures
+during application updates is configured in the controller spec.
+(Learn about [updating a deployment](/docs/concepts/cluster-administration/manage-deployment/#updating-your-application-without-a-service-outage).)
+
+When a pod is evicted using the eviction API, it is gracefully terminated (see
+`terminationGracePeriodSeconds` in [PodSpec](/docs/resources-reference/v1.6/#podspec-v1-core).)
+
+## PDB Example
+
+Consider a cluster with 3 nodes, `node-1` through `node-3`.
+The cluster is running several applications.  One of them has 3 replicas initially called
+`pod-a`, `pod-b`, and `pod-c`.  Another, unrelated pod without a PDB, called `pod-x`, is also shown.
+Initially, the pods are layed out as follows:
+
+|       node-1         |       node-2        |       node-3       |
+|:--------------------:|:-------------------:|:------------------:|
+| pod-a  *available*   | pod-b *available*   | pod-c *available*  |
+| pod-x  *available*   |                     |                    |
+
+All 3 pods are part of an deployment, and they collectively have a PDB which requires
+there be at least 2 of the 3 pods to be available at all times.
+
+For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.
+The cluster administrator first tries to drain `node-1` using the `kubectl drain` command.
+That tool tries to evict `pod-a` and `pod-x`.  This succeeds immediately.
+Both pods go into the `terminating` state at the same time.
+This puts the cluster in this state:
+
+|   node-1 *draining*  |       node-2        |       node-3       |
+|:--------------------:|:-------------------:|:------------------:|
+| pod-a  *terminating* | pod-b *available*   | pod-c *available*  |
+| pod-x  *terminating* |                     |                    |
+
+The deployment notices that one of the pods is terminating, so it creates a replacement
+called `pod-d`.  Since `node-1` is cordoned, it lands on another node.  Something has
+also created `pod-y` as a replacement for `pod-x`.
+
+(Note: for a StatefulSet, `pod-a`, which would be called something like `pod-1`, would need
+to terminate completely before its replacement, which is also called `pod-1` but has a
+different UID, could be created.  Otherwise, the example applies to a StatefulSet as well.)
+
+Now the cluster is in this state:
+
+|   node-1 *draining*  |       node-2        |       node-3       |
+|:--------------------:|:-------------------:|:------------------:|
+| pod-a  *terminating* | pod-b *available*   | pod-c *available*  |
+| pod-x  *terminating* | pod-d *starting*    | pod-y              |
+
+At some point, the pods terminate, and the cluster look like this:
+
+|    node-1 *drained*  |       node-2        |       node-3       |
+|:--------------------:|:-------------------:|:------------------:|
+|                      | pod-b *available*   | pod-c *available*  |
+|                      | pod-d *starting*    | pod-y              |
+
+At this point, if an impatient cluster administrator tries to drain `node-2` or
+`node-3`, the drain command will block, because there are only 2 available
+pods for the deployment, and its PDB requires at least 2.  After some time
+
+asses, `pod-d` becomes available.
+
+The cluster state now looks like this:
+
+|    node-1 *drained*  |       node-2        |       node-3       |
+|:--------------------:|:-------------------:|:------------------:|
+|                      | pod-b *available*   | pod-c *available*  |
+|                      | pod-d *available*   | pod-y              |
+
+Now, the cluster admin tries to drain `node-2`.
+The drain command will try to evict the two pods in some order, say 
+`pod-b` first and then `pod-d`.  It will succeed at evicting `pod-b`.
+But, when it tries to evict `pod-d`, it will be refused because that would leave only
+one pod available for the deployment.
+
+The deployment creates a replacement for `pod-b` called `pod-e`.
+However, not there are not enough resources in the cluster to schedule
+`pod-e`.  So, the drain  then the drain will block.  The cluster may end up in this
+state:
+
+|    node-1 *drained*  |       node-2        |       node-3       | *no node*          |
+|:--------------------:|:-------------------:|:------------------:|:------------------:|
+|                      | pod-b *available*   | pod-c *available*  | pod-e *pending*    |
+|                      | pod-d *available*   | pod-y              |                    |
+
+At this point, the cluster administrator needs to
+add a node back to the cluster to proceed with the upgrade.
+
+You can see how Kubernetes varies the rate at which disruptions
+can happen, according to:
+
+- how many replicas an application needs
+- how long it takes to gracefully shutdown an instance
+- how long it takes a new instance to start up
+- the type of controller
+- the cluster's resource capacity
+
+## Separating Cluster Owner and Application Owner Roles
+
+Often, it is useful to think of the Cluster Manager
+and Application Owner as separate roles with limited knowlege
+of each other.   This separation of responsibilities
+may make sense in these scenarios:
+
+- when there are many application teams sharing a Kubernetes cluster, and 
+  there is natural specialization of roles
+- when third-party tools or services are used to automate cluster management
+
+Pod Disrutption Budgets support this separation of roles by providing an
+interface between the roles.
+
+If you do not have such a separation of responsibilities in your organization,
+you may not need to use Pod Disruption Budgets.
+
+## How to perform Distruptive Actions your Cluster
+
+If you are a Cluster Administrator, and you need to perform a disruptive action on all
+the nodes in your cluster, such as a node or system software upgrade, here are some options:
+
+- Accept downtime during the upgrade. 
+- Fail over to another complete replica cluster.
+   -  No downtime, but may be costly both for the duplicated nodes,
+     and for human effort to orchestrate the switchover.
+- Write disruption tolerant applications and use PDBs.
+   - No downtime.
+   - Minimal resource duplication.
+   - Allows more automation of cluster administration.
+   - Writing disruption-tolerant applications is tricky, but the work to tolerate voluntary
+     disruptions largely overlaps with work to support autoscaling and tolerating
+     involuntary disruptions.
+
+{% endcapture %}
+
+
+{% capture whatsnext %}
+
+* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application//configure-pdb.md).
+
+* Learn more about [draining nodes](/docs/tasks/administer-cluster//safely-drain-node.md)
+
+{% endcapture %} 
+
+
+{% include templates/concept.md %}