WaitForFirstConsumer PD stuck in pending #241

gregwebs · 2018-12-18T15:41:55Z

in gke-storage.yaml I add

volumeBindingMode: WaitForFirstConsumer

I create a new kubernetes cluster, and deploy tidb.

$ kubectl get pod -n tidb1

NAME                              READY   STATUS      RESTARTS   AGE
demo-monitor-5988ddc86c-mf2nz     2/2     Running     0          8m
demo-monitor-configurator-xgrd2   0/1     Completed   0          8m
demo-pd-0                         0/1     Pending     0          8m
demo-tidb-initializer-qfbgq       1/1     Running     0          8m

$ kubectl get pvc -n tidb1
NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-demo-pd-0   Pending                                      pd-ssd         8m

There are no pv. The pod shows a warning:

$ kubectl describe -n tidb1 pod demo-pd-0
...
Events:
  Type     Reason            Age                   From            Message
  ----     ------            ----                  ----            -------
  Warning  FailedScheduling  2m26s (x37 over 12m)  tidb-scheduler  0/3 nodes are available: 3 node(s) didn't find available persistent volumes to bind.

The text was updated successfully, but these errors were encountered:

gregwebs · 2018-12-18T15:43:51Z

I think this issue was pointed out here but never addressed.

tennix · 2018-12-19T00:27:31Z

We've removed the WaitForFirstConsumer volume binding mode for GKE pd-ssd in #130. And it works for the tutorial.

gregwebs · 2018-12-19T16:08:10Z

The tutorial is single AZ. This issue is for a multi-AZ deploy which doesn't seem to work.

tennix · 2018-12-20T05:39:56Z

For GCE persistent disk, the volume binding mode should not be set to WaitForFirstConsumer, otherwise, the PVC would be in pending mode. This is for both single AZ and multi-AZ. And I've tested that after removing WaitForFirstConsumer volume binding mode, the PV can be created and pod can be scheduled correctly. And I also noted that for multi-AZ deploy, the statefulset schedules the pods across all the AZs.
So I think this is not a problem for multi-AZ deployment. You should remove WaitForFirstConsumer binding mode for the storage class.

gregwebs · 2018-12-20T18:02:25Z

Without WaitForFirstConsumer.

$ kubectl describe pod -n tidb1 demo-pd-1 | grep Warning
  Warning  FailedScheduling    2m14s (x3 over 2m17s)  tidb-scheduler                                pod has unbound PersistentVolumeClaims (repeated 2 times)
  Warning  FailedAttachVolume  55s (x8 over 2m9s)     attachdetach-controller                       AttachVolume.Attach failed for volume "pvc-730383e6-0480-11e9-a0f7-42010a8a0098" : GCE persistent disk not found: diskName="gke-beta-3f873039-dyna-pvc-730383e6-0480-11e9-a0f7-42010a8a0098" zone="us-west1-a"
  Warning  FailedMount         7s                     kubelet, gke-beta-default-pool-1363b1c3-vh39  Unable to mount volumes for pod "demo-pd-1_tidb1(730532bc-0480-11e9-a0f7-42010a8a0098)": timeout expired waiting for volumes to attach or mount for pod "tidb1"/"demo-pd-1". list of unmounted volumes=[pd]. list of unattached volumes=[pd annotations config startup-script default-token-pktgf]

$ gcloud compute disks list | grep gke-beta-3f873039-dyna-pvc-730383e6-0480-11e9-a0f7-42010a8a0098
gke-beta-3f873039-dyna-pvc-730383e6-0480-11e9-a0f7-42010a8a0098  us-west1-c  2        pd-ssd       READY

So here we see the disk is in us-west1-c. The pod fails to find it in us-west1-a.

tennix · 2018-12-21T00:18:06Z

Oh, this seems to the same issue as #180. It is fixed in Kubernetes 1.12

tennix · 2018-12-21T02:02:35Z

The latest GKE cluster is Kubernetes 1.11.5. To confirm it's fixed in 1.12, we should bring up a 1.12 cluster with kube-up script. And test the multi-AZ deployment.

weekface · 2018-12-21T04:19:08Z

https://kubernetes.io/docs/setup/multiple-zones/#volume-limitations

The following limitations are addressed with topology-aware volume binding.

StatefulSet volume zone spreading when using dynamic provisioning is currently not compatible with pod affinity or anti-affinity policies.

If the name of the StatefulSet contains dashes (“-”), volume zone spreading may not provide a uniform distribution of storage across zones.

When specifying multiple PVCs in a Deployment or Pod spec, the StorageClass needs to be configured for a specific single zone, or the PVs need to be statically provisioned in a specific zone. Another workaround is to use a StatefulSet, which will ensure that all the volumes for a replica are provisioned in the same zone.

There are many other limitations with SatefulSet and PV.

weekface · 2018-12-21T04:37:36Z

My mistake, these limitations are addressed.

The following limitations are addressed with topology-aware volume binding.

tennix · 2018-12-22T06:17:34Z

@gregwebs I've confirmed that this is a bug of our scheduler extender. On GKE the latest Kubernetes version is 1.11.5 right now, and the WaitForFirstConsumer volume binding mode cannot work correctly as the document said.
However, using Immediate volume binding mode normal can be scheduled correctly. But tidb-operator uses scheduler extender for pd/tikv/tidb pods, and both Immediate and WaitForFirstConsumer binding modes fail to schedule pods. After changing the scheduler to default-scheduler the pods can be scheduled correctly.

weekface · 2018-12-22T12:04:11Z

Our extended scheduler's kube-scheduler policy config lacks a predicate: NoVolumeZoneConflict.

It works when id add it to the list.

The kube-scheduler default config is:

Creating scheduler with fit predicates 'map[NoVolumeZoneConflict:{} MaxEBSVolumeCount:{} MaxAzureDiskVolumeCount:{} NoDiskConflict:{} GeneralPredicates:{} PodToleratesNodeTaints:{} CheckVolumeBinding:{} MaxGCEPDVolumeCount:{} MatchInterPodAffinity:{} CheckNodeMemoryPressure:{} CheckNodeDiskPressure:{} CheckNodePIDPressure:{} CheckNodeCondition:{}]' and priority functions 'map[SelectorSpreadPriority:{} InterPodAffinityPriority:{} LeastRequestedPriority:{} BalancedResourceAllocation:{} NodePreferAvoidPodsPriority:{} NodeAffinityPriority:{} TaintTolerationPriority:{}]'

We should change our policy config to this.

tennix mentioned this issue Dec 21, 2018

PD pod unschedulable with scheduler extender #180

Closed

gregwebs mentioned this issue Dec 21, 2018

GKE Regional Cluster support doesn't work #247

Closed

gregwebs added the type/bug Something isn't working label Dec 21, 2018

weekface mentioned this issue Dec 22, 2018

fix gke volume zone problem #248

Merged

tennix closed this as completed in #248 Dec 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaitForFirstConsumer PD stuck in pending #241

WaitForFirstConsumer PD stuck in pending #241

gregwebs commented Dec 18, 2018

gregwebs commented Dec 18, 2018

tennix commented Dec 19, 2018

gregwebs commented Dec 19, 2018

tennix commented Dec 20, 2018

gregwebs commented Dec 20, 2018 •

edited

Loading

tennix commented Dec 21, 2018

tennix commented Dec 21, 2018

weekface commented Dec 21, 2018

weekface commented Dec 21, 2018

tennix commented Dec 22, 2018

weekface commented Dec 22, 2018

WaitForFirstConsumer PD stuck in pending #241

WaitForFirstConsumer PD stuck in pending #241

Comments

gregwebs commented Dec 18, 2018

gregwebs commented Dec 18, 2018

tennix commented Dec 19, 2018

gregwebs commented Dec 19, 2018

tennix commented Dec 20, 2018

gregwebs commented Dec 20, 2018 • edited Loading

tennix commented Dec 21, 2018

tennix commented Dec 21, 2018

weekface commented Dec 21, 2018

weekface commented Dec 21, 2018

tennix commented Dec 22, 2018

weekface commented Dec 22, 2018

gregwebs commented Dec 20, 2018 •

edited

Loading