[WIP] Deploy CSI driver as a deployment and support multi master & leader e… #72

chethanv28 · 2019-10-02T19:13:52Z

What this PR does / why we need it:
This change is to enable leader election for syncer by using the implementation provided by csi (https://github.com/kubernetes-csi/external-provisioner/blob/master//vendor/github.com/kubernetes-csi/csi-lib-utils/leaderelection/leader_election.go
This change includes rbac and deployment yaml changes to enable leader election for vsphere-syncer, external-provisioner, and external-attacher.
With this change we will have HA enabled for the CSI driver.

Special notes for your reviewer:

Release note:

Enable leader election and HA for vsphere csi driver

Test Results (Sanity):

$ make test-e2e
hack/run-e2e-test.sh
go: finding github.com/onsi/ginkgo/ginkgo latest
Oct  2 12:32:14.586: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
Running Suite: CNS CSI Driver End-to-End Tests
==============================================
Random Seed: 1570044718 - Will randomize all specs
Will run 2 of 42 specs

SSSSSSSSSSSSS
------------------------------
[csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy 
  Verify dynamic provisioning of PV fails with user specified non-shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:82
[BeforeEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/go/pkg/mod/k8s.io/kubernetes@v1.14.2/test/e2e/framework/framework.go:149
STEP: Creating a kubernetes client
Oct  2 12:32:14.587: INFO: >>> kubeConfig: /Users/chethanv/.kube/config
STEP: Building a namespace api object, basename e2e-vsphere-volume-provisioning-no-storage-policy
Oct  2 12:32:14.677: INFO: Found PodSecurityPolicies; assuming PodSecurityPolicy is enabled.
Oct  2 12:32:14.832: INFO: Found ClusterRoles; assuming RBAC is enabled.
STEP: Binding the e2e-test-privileged-psp PodSecurityPolicy to the default service account in e2e-vsphere-volume-provisioning-no-storage-policy-6359
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:56
[It] Verify dynamic provisioning of PV fails with user specified non-shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:82
STEP: Invoking Test for user specified non-shared Datastore in storage class for volume provisioning
STEP: Creating StorageClass With scParameters: map[DatastoreURL:ds:///vmfs/volumes/5d8d2c33-aa1809eb-c8cd-0200027eba1c/] and allowedTopologies: []
STEP: Creating StorageClass With scParameters: map[DatastoreURL:ds:///vmfs/volumes/5d8d2c33-aa1809eb-c8cd-0200027eba1c/] and allowedTopologies: [] and ReclaimPolicy: 
STEP: Creating PVC using the Storage Class sc-ss95g with disk size  and labels: map[]
STEP: Expect claim to fail provisioning volume on non shared datastore
Oct  2 12:32:15.073: INFO: Waiting up to 30s for PersistentVolumeClaims [pvc-b4vzt] to have phase Bound
Oct  2 12:32:15.100: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:17.109: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:19.122: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:21.146: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:23.157: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:25.166: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:27.181: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:29.199: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:31.217: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:33.235: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:35.255: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:37.266: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:39.281: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:41.297: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Oct  2 12:32:43.308: INFO: PersistentVolumeClaim pvc-b4vzt found but phase is Pending instead of Bound.
Actual failure message: "failed to provision volume with StorageClass \"sc-ss95g\": rpc error: code = Internal desc = Failed to create volume. Error: DatastoreURL: ds:///vmfs/volumes/5d8d2c33-aa1809eb-c8cd-0200027eba1c/ specified in the storage class is not found."
Expected failure message: "failed to provision volume with StorageClass \"sc-ss95g\""
Oct  2 12:32:45.327: INFO: Deleting PersistentVolumeClaim "pvc-b4vzt"
[AfterEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/go/pkg/mod/k8s.io/kubernetes@v1.14.2/test/e2e/framework/framework.go:150
Oct  2 12:32:45.376: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-vsphere-volume-provisioning-no-storage-policy-6359" for this suite.
Oct  2 12:32:51.423: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Oct  2 12:32:51.679: INFO: namespace e2e-vsphere-volume-provisioning-no-storage-policy-6359 deletion completed in 6.287813192s

• [SLOW TEST:37.092 seconds]
[csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
/Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:47
  Verify dynamic provisioning of PV fails with user specified non-shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:82
------------------------------
SSS
------------------------------
[csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy 
  Verify dynamic provisioning of PV passes with user specified shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:67
[BeforeEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/go/pkg/mod/k8s.io/kubernetes@v1.14.2/test/e2e/framework/framework.go:149
STEP: Creating a kubernetes client
Oct  2 12:32:51.680: INFO: >>> kubeConfig: /Users/chethanv/.kube/config
STEP: Building a namespace api object, basename e2e-vsphere-volume-provisioning-no-storage-policy
STEP: Binding the e2e-test-privileged-psp PodSecurityPolicy to the default service account in e2e-vsphere-volume-provisioning-no-storage-policy-6733
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:56
[It] Verify dynamic provisioning of PV passes with user specified shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:67
STEP: Invoking Test for user specified Shared Datastore in Storage class for volume provisioning
STEP: Creating StorageClass With scParameters: map[DatastoreURL:ds:///vmfs/volumes/vsan:520b365f4aaebc4b-87e27fcb38c60f51/] and allowedTopologies: []
STEP: Creating StorageClass With scParameters: map[DatastoreURL:ds:///vmfs/volumes/vsan:520b365f4aaebc4b-87e27fcb38c60f51/] and allowedTopologies: [] and ReclaimPolicy: 
STEP: Creating PVC using the Storage Class sc-5nxdp with disk size  and labels: map[]
STEP: Expect claim to pass provisioning volume as shared datastore
Oct  2 12:32:51.878: INFO: Waiting up to 1m0s for PersistentVolumeClaims [pvc-jvhdf] to have phase Bound
Oct  2 12:32:51.892: INFO: PersistentVolumeClaim pvc-jvhdf found but phase is Pending instead of Bound.
Oct  2 12:32:53.902: INFO: PersistentVolumeClaim pvc-jvhdf found but phase is Pending instead of Bound.
Oct  2 12:32:55.918: INFO: PersistentVolumeClaim pvc-jvhdf found but phase is Pending instead of Bound.
Oct  2 12:32:57.929: INFO: PersistentVolumeClaim pvc-jvhdf found and phase=Bound (6.050973955s)
Oct  2 12:32:57.929: INFO: Deleting PersistentVolumeClaim "pvc-jvhdf"
[AfterEach] [csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
  /Users/chethanv/go/pkg/mod/k8s.io/kubernetes@v1.14.2/test/e2e/framework/framework.go:150
Oct  2 12:32:57.969: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-vsphere-volume-provisioning-no-storage-policy-6733" for this suite.
Oct  2 12:33:04.015: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Oct  2 12:33:04.341: INFO: namespace e2e-vsphere-volume-provisioning-no-storage-policy-6733 deletion completed in 6.359017404s

• [SLOW TEST:12.661 seconds]
[csi-block-e2e-1] Datastore Based Volume Provisioning With No Storage Policy
/Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:47
  Verify dynamic provisioning of PV passes with user specified shared datastore and no storage policy specified in the storage class
  /Users/chethanv/Downloads/csi-external/vsphere-csi-driver/tests/e2e/vsphere_shared_datastore.go:67
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSS
Ran 2 of 42 Specs in 49.754 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 40 Skipped
PASS

Ginkgo ran 1 suite in 1m5.832595331s
Test Suite Passed

k8s-ci-robot · 2019-10-02T19:14:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chethanv28
To complete the pull request process, please assign frapposelli
You can assign the PR to them by writing /assign @frapposelli in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2019-10-02T19:14:06Z

Hi @chethanv28. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

divyenpatel · 2019-10-02T19:17:53Z

manifests/1.14/deploy/vsphere-csi-controller-deployment.yaml

@@ -29,9 +26,11 @@ spec:
        - name: csi-attacher
          image: quay.io/k8scsi/csi-attacher:v1.1.1
          args:
-            - "--v=4"
+            - "--v=5"


log level 4 should be sufficient. for attacher and provisioner container.

divyenpatel · 2019-10-02T19:23:36Z

manifests/1.14/deploy/vsphere-csi-controller-deployment.yaml

  replicas: 1
-  updateStrategy:


updateStrategy -> RollingUpdate will be useful later for upgrade. Can you keep that?

…lection

RaunakShah · 2019-11-23T17:19:05Z

Shouldn't we add new files at manifests/1.15/ instead of replacing manifests/1.14?

misterikkit · 2019-11-25T18:48:46Z

cmd/syncer/main.go

+	}
+
+	if !*enableLeaderElection {
+		run(context.TODO())


In main() it is generally expected that you will use context.Background() since there is no higher level to provide one. context.TODO() is intended for places where a context should be passed in, but the plumbing hasn't been done yet.

misterikkit · 2019-11-25T18:51:39Z

cmd/syncer/main.go

+	} else {
+		k8sClient, err := k8s.NewClient()
+		if err != nil {
+			klog.Errorf("Creating Kubernetes client failed. Err: %v", err)


suggestion: you can use klog.Exitf() instead of Errorf followed by Exit

misterikkit · 2019-11-25T18:52:22Z

cmd/syncer/main.go

+		le := leaderelection.NewLeaderElection(k8sClient, lockName, run)
+
+		if err := le.Run(); err != nil {
+			klog.Fatalf("Error initializing leader election: %v", err)


suggestion: Fatalf() prints a stack trace, and Exitf() does not. Exitf() may be more appropriate here.

misterikkit · 2019-11-25T18:52:55Z

cmd/syncer/main.go

-		klog.Errorf("Error initializing Metadata Syncer")
-		os.Exit(1)
+	run := func(ctx context.Context) {
+		if err := metadataSyncer.Init(); err != nil {


Is metadataSyncer.Init() a blocking call that runs the syncer forever?

misterikkit · 2019-11-25T18:55:18Z

cmd/syncer/main.go

+	run := func(ctx context.Context) {
+		if err := metadataSyncer.Init(); err != nil {
+			klog.Errorf("Error initializing Metadata Syncer")
+			os.Exit(1)
+		}
+	}


The context that gets passed into your run func will be canceled when this process loses leadership. It looks like you don't pass that into metadataSyncer.Init so we will have no way to tear things down if we lose leadership.

In most core k8s components, we react to loss of leadership with os.Exit() so that the container gets restarted cleanly.

fejta-bot · 2020-02-23T19:14:33Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2020-02-23T19:14:40Z

@chethanv28: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-03-24T19:57:55Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

divyenpatel · 2020-04-07T16:13:11Z

Required change for to enable leader election and HA for vsphere csi driver is included in #151

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2019

k8s-ci-robot requested review from divyenpatel and dougm October 2, 2019 19:13

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 2, 2019

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 2, 2019

chethanv28 changed the title ~~Deploy CSI driver as a deployment and support multi master & leader e…~~ [WIP] Deploy CSI driver as a deployment and support multi master & leader e… Oct 2, 2019

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2019

divyenpatel reviewed Oct 2, 2019

View reviewed changes

Deploy CSI driver as a deployment and support multi master & leader e…

115b0ec

…lection

chethanv28 changed the title ~~[WIP] Deploy CSI driver as a deployment and support multi master & leader e…~~ Deploy CSI driver as a deployment and support multi master & leader e… Oct 2, 2019

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2019

chethanv28 changed the title ~~Deploy CSI driver as a deployment and support multi master & leader e…~~ [WIP] Deploy CSI driver as a deployment and support multi master & leader e… Oct 3, 2019

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2019

RaunakShah mentioned this pull request Nov 23, 2019

Remove vsphere credentials from csi node daemonset #96

Merged

misterikkit reviewed Nov 25, 2019

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 24, 2020

divyenpatel closed this Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Deploy CSI driver as a deployment and support multi master & leader e… #72

[WIP] Deploy CSI driver as a deployment and support multi master & leader e… #72

chethanv28 commented Oct 2, 2019 •

edited

Loading

k8s-ci-robot commented Oct 2, 2019

k8s-ci-robot commented Oct 2, 2019

divyenpatel Oct 2, 2019

divyenpatel Oct 2, 2019

RaunakShah commented Nov 23, 2019

misterikkit Nov 25, 2019

misterikkit Nov 25, 2019

misterikkit Nov 25, 2019

misterikkit Nov 25, 2019

misterikkit Nov 25, 2019

fejta-bot commented Feb 23, 2020

k8s-ci-robot commented Feb 23, 2020

fejta-bot commented Mar 24, 2020

divyenpatel commented Apr 7, 2020

[WIP] Deploy CSI driver as a deployment and support multi master & leader e… #72

[WIP] Deploy CSI driver as a deployment and support multi master & leader e… #72

Conversation

chethanv28 commented Oct 2, 2019 • edited Loading

k8s-ci-robot commented Oct 2, 2019

k8s-ci-robot commented Oct 2, 2019

divyenpatel Oct 2, 2019

Choose a reason for hiding this comment

divyenpatel Oct 2, 2019

Choose a reason for hiding this comment

RaunakShah commented Nov 23, 2019

misterikkit Nov 25, 2019

Choose a reason for hiding this comment

misterikkit Nov 25, 2019

Choose a reason for hiding this comment

misterikkit Nov 25, 2019

Choose a reason for hiding this comment

misterikkit Nov 25, 2019

Choose a reason for hiding this comment

misterikkit Nov 25, 2019

Choose a reason for hiding this comment

fejta-bot commented Feb 23, 2020

k8s-ci-robot commented Feb 23, 2020

fejta-bot commented Mar 24, 2020

divyenpatel commented Apr 7, 2020

chethanv28 commented Oct 2, 2019 •

edited

Loading