Skip to content

Commit

Permalink
Document topology aware volume binding feature
Browse files Browse the repository at this point in the history
  • Loading branch information
msau42 committed Sep 7, 2018
1 parent f8e4d35 commit 6975125
Show file tree
Hide file tree
Showing 4 changed files with 100 additions and 19 deletions.
7 changes: 7 additions & 0 deletions content/en/docs/concepts/storage/dynamic-provisioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ Note that there can be at most one *default* storage class on a cluster, or
a `PersistentVolumeClaim` without `storageClassName` explicitly specified cannot
be created.

## Topology Awareness

In [multi-zone](/docs/setup/multiple-zones) clusters, pods can be spread across
zones and single-zone storage backends should be provisioned in the zones where
pods are scheduled. This can be accomplished by setting the [volume binding
mode](/docs/concepts/storage/storage-classes/#volume-binding-mode).

{{% /capture %}}


94 changes: 83 additions & 11 deletions content/en/docs/concepts/storage/storage-classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ parameters:
reclaimPolicy: Retain
mountOptions:
- debug
volumeBindingMode: Immediate
```
### Provisioner
Expand All @@ -66,15 +67,15 @@ for provisioning PVs. This field must be specified.
| Volume Plugin | Internal Provisioner| Config Example |
| :--- | :---: | :---: |
| AWSElasticBlockStore | ✓ | [AWS](#aws) |
| AWSElasticBlockStore | ✓ | [AWS EBS](#aws-ebs) |
| AzureFile | ✓ | [Azure File](#azure-file) |
| AzureDisk | ✓ | [Azure Disk](#azure-disk) |
| CephFS | - | - |
| Cinder | ✓ | [OpenStack Cinder](#openstack-cinder)|
| FC | - | - |
| FlexVolume | - | - |
| Flocker | ✓ | - |
| GCEPersistentDisk | ✓ | [GCE](#gce) |
| GCEPersistentDisk | ✓ | [GCE PD](#gce-pd) |
| Glusterfs | ✓ | [Glusterfs](#glusterfs) |
| iSCSI | - | - |
| Quobyte | ✓ | [Quobyte](#quobyte) |
Expand Down Expand Up @@ -120,6 +121,73 @@ If the volume plugin does not support mount options but mount options are
specified, provisioning will fail. Mount options are not validated on either
the class or PV, so mount of the PV will simply fail if one is invalid.

### Volume Binding Mode

{{< feature-state for_k8s_version="v1.12" state="beta" >}}

**Note:** This feature requires the `VolumeScheduling` feature gate to be
enabled.

The `volumeBindingMode` field controls when [volume binding and dynamic
provisioning](/docs/concepts/storage/persistent-volumes/#provisioning) should occur.

By default, the `Immediate` mode indicates that volume binding and dynamic
provisioning occurs once the Persistent Volume Claim is created. For storage
backends that are topology-constrained and not globally accessible from all nodes
in the cluster (for example, a zonal disk, or a local volume), this causes
volumes to be bound or provisioned without knowledge of the pod's scheduling
requirements, and can result in unschedulable pods.

To address this issue, the `WaitForFirstConsumer` mode can be specified which
will delay binding and provisioning until a pod using the PVC is created.
Volumes will be selected or provisioned with the appropriate topology that is
compatible with the pod's scheduling constraints, including but not limited to, [resource
requirements](/docs/concepts/configuration/manage-compute-resources-container),
[node selectors](/docs/concepts/configuration/assign-pod-node/#nodeselector),
[pod affinity and
anti-affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity),
and [taints and tolerations](/docs/concepts/configuration/taint-and-toleration).

The following plugins support `WaitForFirstConsumer` with dynamic provisioning:
* [AWSElasticBlockStore](#aws-ebs)
* [GCEPersistentDisk](#gce-pd)
* [AzureDisk](#azure-disk)

The following plugins support `WaitForFirstConsumer` with pre-created PV binding:
* All of the above
* [Local](#local)

### Allowed Topologies
{{< feature-state for_k8s_version="v1.12" state="beta" >}}

**Note:** This feature requires the `VolumeScheduling` feature gate to be
enabled.

When `WaitForFirstConsumer` volume binding mode is specified, it is no longer necessary
to restrict provisioning to specific topologies in most situations. However,
if still required, `allowedTopologies` can be specified.

This example demonstrates how to restrict topology of provisioned volumes to specific
zones and should be used as a replacement for the `zone` and `zones` parameters for the
supported plugins.

```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-a
- us-cetnral1-b
```

## Parameters

Storage classes have parameters that describe volumes belonging to the storage
Expand All @@ -128,7 +196,7 @@ class. Different parameters may be accepted depending on the `provisioner`. For
`iopsPerGB` are specific to EBS. When a parameter is omitted, some default is
used.

### AWS
### AWS EBS

```yaml
kind: StorageClass
Expand All @@ -138,17 +206,16 @@ metadata:
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
zones: us-east-1d, us-east-1c
iopsPerGB: "10"
```

* `type`: `io1`, `gp2`, `sc1`, `st1`. See
[AWS docs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
for details. Default: `gp2`.
* `zone`: AWS zone. If neither `zone` nor `zones` is specified, volumes are
* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster
has a node. `zone` and `zones` parameters must not be used at the same time.
* `zones`: A comma separated list of AWS zone(s). If neither `zone` nor `zones`
* `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
Expand All @@ -164,7 +231,10 @@ parameters:
encrypting the volume. If none is supplied but `encrypted` is true, a key is
generated by AWS. See AWS docs for valid ARN value.

### GCE
**Note:** `zone` and `zones` parameters are deprecated and replaced with
[allowedTopologies](#allowed-topologies)

### GCE PD

```yaml
kind: StorageClass
Expand All @@ -174,15 +244,14 @@ metadata:
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zones: us-central1-a, us-central1-b
replication-type: none
```

* `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard`
* `zone`: GCE zone. If neither `zone` nor `zones` is specified, volumes are
* `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster has
a node. `zone` and `zones` parameters must not be used at the same time.
* `zones`: A comma separated list of GCE zone(s). If neither `zone` nor `zones`
* `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
Expand All @@ -199,6 +268,9 @@ specified, Kubernetes will arbitrarily choose among the specified zones. If the
`zones` parameter is omitted, Kubernetes will arbitrarily choose among zones
managed by the cluster.

**Note:** `zone` and `zones` parameters are deprecated and replaced with
[allowedTopologies](#allowed-topologies)

### Glusterfs

```yaml
Expand Down Expand Up @@ -678,4 +750,4 @@ Delaying volume binding allows the scheduler to consider all of a pod's
scheduling constraints when choosing an appropriate PersistentVolume for a
PersistentVolumeClaim.

{{% /capture %}}
{{% /capture %}}
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ different Kubernetes components.
| `DevicePlugins` | `true` | Beta | 1.10 | |
| `DynamicKubeletConfig` | `false` | Alpha | 1.4 | 1.10 |
| `DynamicKubeletConfig` | `true` | Beta | 1.11 | |
| `DynamicProvisioningScheduling` | `false` | Alpha | 1.11 | |
| `DynamicProvisioningScheduling` | `false` | Alpha | 1.11 | 1.11 |
| `DynamicVolumeProvisioning` | `true` | Alpha | 1.3 | 1.7 |
| `DynamicVolumeProvisioning` | `true` | GA | 1.8 | |
| `EnableEquivalenceClassCache` | `false` | Alpha | 1.8 | |
Expand Down
16 changes: 9 additions & 7 deletions content/en/docs/setup/multiple-zones.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,18 +72,20 @@ available and can tolerate the loss of a zone, the control plane is
located in a single zone. Users that want a highly available control
plane should follow the [high availability](/docs/admin/high-availability) instructions.

### Volume limitations
The following limitations are addressed with [topology-aware volume binding](/docs/concepts/storage/storage-classes/#volume-binding-mode).

* StatefulSet volume zone spreading when using dynamic provisioning is currently not compatible with
pod affinity or anti-affinity policies.
pod affinity or anti-affinity policies.

* If the name of the StatefulSet contains dashes ("-"), volume zone spreading
may not provide a uniform distribution of storage across zones.
may not provide a uniform distribution of storage across zones.

* When specifying multiple PVCs in a Deployment or Pod spec, the StorageClass
needs to be configured for a specific, single zone, or the PVs need to be
statically provisioned in a specific zone. Another workaround is to use a
StatefulSet, which will ensure that all the volumes for a replica are
provisioned in the same zone.

needs to be configured for a specific, single zone, or the PVs need to be
statically provisioned in a specific zone. Another workaround is to use a
StatefulSet, which will ensure that all the volumes for a replica are
provisioned in the same zone.

## Walkthrough

Expand Down

0 comments on commit 6975125

Please sign in to comment.