Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request coalescing for resizing and modifying volume #1676

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

hanyuel
Copy link
Contributor

@hanyuel hanyuel commented Jul 10, 2023

Is this a bug fix or adding new feature?
New feature

What is this PR about? / Why do we need it?
One restriction of the volume modification support provided by EBS CSI Driver is that if a user updates both annotations and capacity of a PVC at the same time, it will result in 2 RPC calls from external-resizer and volume-modifier sidecars to the CSI Driver, and only one of the them will succeed due to the 6-hour cool down period for EBS ModifyVolume.

This PR solves that problem by implementing the Request Coalescing feature in EBS CSI Driver that merges RPC requests from externa-resizer and volume-modifier sidecars.

What testing is done?
New unit tests && make test and make verify pass.
Built and Deployed a CSI Driver image with the changes in this PR.

  1. Created a PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-sc
  resources:
    requests:
      storage: 4Gi
  1. Verified the PVC via kubectl describe pvc ebs-claim
Name:          ebs-claim
Namespace:     default
StorageClass:  ebs-sc
Status:        Bound
Volume:        pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: ip-192-168-38-32.ec2.internal
               volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      4Gi
  1. Updated the PVC with new size, volume type, throughput, iops via kubectl edit.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
        ebs.csi.aws.com/volumeType: "io2"
        ebs.csi.aws.com/iops: "1000"
        ebs.csi.aws.com/throughput: "1000"

spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  1. Verified the PVC and PV was updated with the new size, volume type, iops and throughput.

Name:          ebs-claim
Namespace:     default
StorageClass:  ebs-sc
Status:        Bound
Volume:       pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
Labels:        <none>
Annotations:   ebs.csi.aws.com/iops: 1000
               ebs.csi.aws.com/throughput: 1000
               ebs.csi.aws.com/volumeType: io2
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: ip-192-168-38-32.ec2.internal
               volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       app
Events:
  Type     Reason                        Age                From                                                                                      Message
  ----     ------                        ----               ----                                                                                      -------
  Normal   VolumeModificationStarted     60s                  volume-modifier-for-k8s-ebs.csi.aws.com                                                   External modifier is modifying volume pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
  Warning  ExternalExpanding             60s                  volume_expand                                                                             Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
  Normal   Resizing                      60s                  external-resizer ebs.csi.aws.com                                                          External resizer is resizing volume pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
  Normal   VolumeModificationSuccessful  55s                  volume-modifier-for-k8s-ebs.csi.aws.com                                                   External modifier has successfully modified volume pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
  Normal   FileSystemResizeRequired      55s                  external-resizer ebs.csi.aws.com                                                          Require file system resize of volume on node
  Normal   FileSystemResizeSuccessful    26s                  kubelet                                                                                   MountVolume.NodeExpandVolume succeeded for volume "pvc-fe46b239-1e50-4e7e-9d99-b041d964c048" ip-192-168-38-32.ec2.internal

kubectl describe pv   
Name:              pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
Labels:            <none>
Annotations:       ebs.csi.aws.com/iops: 1000
                   ebs.csi.aws.com/throughput: 1000
                   ebs.csi.aws.com/volumeType: io2
                   pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
                   volume.kubernetes.io/provisioner-deletion-secret-name: 
                   volume.kubernetes.io/provisioner-deletion-secret-namespace: 
Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass:      resize-sc
Status:            Bound
Claim:             default/ebs-claim
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          5Gi
  1. Inspected logs.
    csi controller logs:
I0810 17:13:22.132209       1 controller_modify_volume.go:172] "ModifyVolumeProperties called" req="name:\"vol-0b280a33be37a7458\" parameters:{key:\"iops\" value:\"1000\"} parameters:{key:\"throughput\" value:\"1000\"} parameters:{key:\"volumeType\" value:\"io2\"}"
I0810 17:13:22.132413       1 controller_modify_volume.go:107] "Start processing ModifyVolumeRequest for " volume ID="vol-0b280a33be37a7458"
I0810 17:13:22.143910       1 controller.go:442] "ControllerGetCapabilities: called" args={}
I0810 17:13:22.144537       1 controller.go:496] "ControllerExpandVolume: called" args={"volume_id":"vol-0b280a33be37a7458","capacity_range":{"required_bytes":5368709120},"volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}}
I0810 17:13:24.144910       1 cloud.go:476] "Received Resize and/or Modify Disk request" volumeID="vol-0b280a33be37a7458" newSizeBytes=5368709120 options={"VolumeType":"io2","IOPS":1000,"Throughput":1000}

resizer sidecar logs:

I0810 17:13:22.143733       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"fe46b239-1e50-4e7e-9d99-b041d964c048", APIVersion:"v1", ResourceVersion:"14114656", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
I0810 17:13:27.449721       1 controller.go:483] Resize volume succeeded for volume "pvc-fe46b239-1e50-4e7e-9d99-b041d964c048", start to update PV's capacity

volume-modifier sidecar logs:

I0810 17:13:22.131085       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"fe46b239-1e50-4e7e-9d99-b041d964c048", APIVersion:"v1", ResourceVersion:"14114654", FieldPath:""}): type: 'Normal' reason: 'VolumeModificationStarted' External modifier is modifying volume pvc-fe46b239-1e50-4e7e-9d99-b041d964c048
I0810 17:13:22.131614       1 csi_modifier.go:62] "Calling modify volume for volume" volumeID="vol-0b280a33be37a7458"
I0810 17:13:27.449787       1 csi_client.go:67] "Volume modification completed" volumeID="vol-0b280a33be37a7458"

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2023
@k8s-ci-robot k8s-ci-robot requested review from gtxu and torredil July 10, 2023 15:05
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 10, 2023
@hanyuel hanyuel changed the title Request coalesing for resizing and modifying volume Request coalescing for resizing and modifying volume Jul 10, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 10, 2023
@hanyuel hanyuel force-pushed the req-coalescing branch 5 times, most recently from 784ee35 to 427b0ce Compare July 13, 2023 20:13
@hanyuel hanyuel marked this pull request as ready for review July 13, 2023 20:14
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@k8s-ci-robot k8s-ci-robot requested review from ConnorJC3 and rdpsin July 13, 2023 20:14
Copy link
Contributor

@wmesard wmesard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

A couple of comment nits, and one potential locking issue.

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
}

func (h *modifyVolumeRequestHandler) mergeModifyVolumeRequest(r *modifyVolumeRequest) {
h.mux.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateModifyVolumeRequest and mergeModifyVolumeRequest need to happen in the same lock-context, otherwise you create a window where an incompatible request can race in after the validate, but before the merge, and we won't notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See the latest addModifyVolumeRequest and startVolumeTimer methods.

pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

@wmesard: changing LGTM is restricted to collaborators

In response to this:

Looks good!

A couple of comment nits, and one potential locking issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revision largely lgtm, left a few comments.

pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Show resolved Hide resolved
pkg/cloud/cloud.go Show resolved Hide resolved
Copy link
Contributor

@ConnorJC3 ConnorJC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top level comments (some of these I'm sure are obvious, but for the sake of the review):

  • Needs the issues with make test and make verify fixed
  • Needs unit tests written for the feature
  • Otherwise, largely good, but one major deadlock possible (see individual comments)

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 2, 2023
@hanyuel hanyuel force-pushed the req-coalescing branch 5 times, most recently from 9a9ffe0 to d5ab6f1 Compare August 3, 2023 12:39
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
pkg/driver/controller_modify_volume.go Outdated Show resolved Hide resolved
@hanyuel hanyuel force-pushed the req-coalescing branch 3 times, most recently from 419689d to 6dd1e9a Compare August 4, 2023 15:58
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2023
@hanyuel hanyuel force-pushed the req-coalescing branch 2 times, most recently from 6bd42fb to 167cbc4 Compare August 9, 2023 21:21
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 9, 2023
@ConnorJC3
Copy link
Contributor

/retest

Copy link
Contributor

@ConnorJC3 ConnorJC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2023
Signed-off-by: Hanyue Liang <hanliang@amazon.com>
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 10, 2023

@hanyuel: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-aws-ebs-csi-driver-test-e2e-external-eks-windows 3705632 link false /test pull-aws-ebs-csi-driver-test-e2e-external-eks-windows

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@torredil
Copy link
Member

/test pull-aws-ebs-csi-driver-external-test

@ConnorJC3
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2023
Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hanyuel /lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2023
@k8s-ci-robot k8s-ci-robot merged commit 6365ffd into kubernetes-sigs:master Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants