update inFlight cache to avoid race condition on volume operation #924

AndyXiangLi · 2021-06-08T00:33:10Z

Is this a bug fix or adding new feature?
Fixes #918
What is this PR about? / Why do we need it?
Per spec, there is no more than one call “in-flight” per volume at a given time.
Add another layer of protection on the driver to make sure apis are idempotent.
What testing is done?
e2e test on k8s 1.19

coveralls · 2021-06-08T00:38:53Z

Coverage increased (+0.5%) to 79.496% when pulling 383299e on AndyXiangLi:create-volume-idempotent into 454fdb4 on kubernetes-sigs:master.

AndyXiangLi · 2021-06-08T00:40:54Z

pkg/driver/controller.go

@@ -307,6 +314,14 @@ func (d *controllerService) ControllerPublishVolume(ctx context.Context, req *cs
 		return nil, status.Error(codes.InvalidArgument, errString)
 	}

+	// check if a request is already in-flight, VolumeId should be enough at this moment because Multi node writer mode is not support in the driver
+	// However, use req hash as key because eventually we need to support the ability to publish the same volume onto different nodes concurrently, and I don't see any harm here.
+	if ok := d.inFlight.Insert(req.String()); !ok {


As we only support single writer mode now, I would say it is ok to just use volumeId as key here. However, I don't see any harm if we using combination of volumeId&nodeId, If customer is trying to attach same volume to different node now, this request will be reject in other steps. Advise?

I removed this block in ControllerPublishVolume and ControllerUnpublishVolume, e2e test is failing because of this. I think we need to dig deeper into the publish/unpublish volume scenario. It's not related to the issue though, how about we implement this in a separate PR?

which test? I agree, let's leave ControllerPublish/Attach for another PR, it's more complicated for sure

the spec says: This operation MUST be idempotent. If the volume corresponding to the volume_id has already been published at the node corresponding to the node_id, and is compatible with the specified volume_capability and readonly flag, the Plugin MUST reply 0 OK.

So you are right, we should use volume_id and node_id.

If the attacher requests the same volume_id and different node_id (and volume is not multiattach, which we don't support yet anyway) the request will fail/get rejected anyway, I agree that's fine.

Failed test: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_aws-ebs-csi-driver/924/pull-aws-ebs-csi-driver-e2e-single-az/1402061105047539712

it seems like detach just took too long? or somehow the lock/inflight didn't get released, although the code is simple so I don't know how that would be possible.

I0608 00:51:24.912672 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached I0608 00:51:26.006673 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached I0608 00:51:27.918180 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached W0608 00:53:00.254490 1 cloud.go:535] Ignoring error from describe volume for volume "vol-0944244f08b5badd0"; will retry: "RequestCanceled: request context canceled\ncaused by: context deadline exceeded" E0608 00:53:28.688453 1 driver.go:119] GRPC error: rpc error: code = Aborted desc = ControllerUnpublishVolume for Volume vol-0944244f08b5badd0 and Node i-071ebf53811520106 is already in progress E0608 00:54:18.254315 1 driver.go:119] GRPC error: rpc error: code = Aborted desc = ControllerUnpublishVolume for Volume vol-0944244f08b5badd0 and Node i-071ebf53811520106 is already in progress

I wonder if it's possible for attach/detach to deadlock since in kubernetes world it's quite common/possible to have 1 replica Terminating on one Node and its vol trying to detach at the same time as 1 replica Pending on another Node and its vol trying to attach. This would mean we would need to index by ControllerPublish+volume_id+node_id or ontrollerUnpublish+volume_id+node_id

wongma7 · 2021-06-08T00:41:15Z

pkg/driver/controller.go

@@ -121,6 +121,13 @@ func (d *controllerService) CreateVolume(ctx context.Context, req *csi.CreateVol
 		return nil, status.Error(codes.InvalidArgument, errString)
 	}

+	// check if a request is already in-flight, move this block to the beginning of the request to protect idempotency of this API
+	if ok := d.inFlight.Insert(req.String()); !ok {


what if we move this all the way to beginning of the func, even before validating the request?

Also I noticed it's inconsistent how some functions index * with volumeID but others with request...I think they should all use request?

When I look into the spec, it is saying there is no more than one call “in-flight” per volume at a given time. I was struggled to determine using volumeId or whole req. If we use req, we are able to guarantee no more than one request per operation for one volume, but some case like deleteVolume & createSnapshot on same VolumeId will be allowed. Just wondering should we care about these use cases, otherwise use whole request make sense to me..

let's consult the spec and think about it rigorously for each operation, I am scared of races :D

CreateVolume: We should use name: https://github.com/container-storage-interface/spec/blob/master/spec.md#controller-service-rpc
DeleteVolume: We should use volume_id: https://github.com/container-storage-interface/spec/blob/master/spec.md#deletevolume
CreateSnapshot: We should use name: https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot
DeleteSnapshot: We should use snapshot_id: https://github.com/container-storage-interface/spec/blob/master/spec.md#deletesnapshot

obviously, for createvolume and createsnapshot, you don't know the ID yet, so you can't use it as an idempotency token, that's why the spec requires name.

wongma7 · 2021-06-08T19:58:57Z

for ControllerUnpublish, we get stuck in this loop

aws-ebs-csi-driver/pkg/cloud/cloud.go

Line 499 in 2b61230

    
           func (c *cloud) WaitForAttachmentState(ctx context.Context, volumeID, expectedState string, expectedInstance string, expectedDevice string, alreadyAssigned bool) (*ec2.VolumeAttachment, error) {

. the context gets cancelled after just 15 seconds by external-attacher https://github.com/kubernetes-csi/external-attacher/blob/e9f6477657cf1616e63405a9ad29aaf99ebfad70/pkg/controller/csi_handler.go#L574 https://github.com/kubernetes-csi/external-attacher/blob/e9f6477657cf1616e63405a9ad29aaf99ebfad70/cmd/csi-attacher/main.go#L59 but DescribeVolumesWithContext keeps getting retried with the cancelled context

aws-ebs-csi-driver/pkg/cloud/cloud.go

Line 805 in 2b61230

response, err := c.ec2.DescribeVolumesWithContext(ctx, request)

I think the fix is to check ctx.Err() with every loop iteration, if it has been cancelled we should exit the loop. External-attacher will then call ControllerUnpublish again and start another loop

wongma7 · 2021-06-08T20:02:14Z

The alternative is to increase external-attacher context timeout to be much higher, like 45 minutes

aws-ebs-csi-driver/pkg/cloud/cloud.go

Line 500 in 2b61230

// Most attach/detach operations on AWS finish within 1-4 seconds.

In that case, our own retry logic will dictate the frequency we call DescribeVolumes instead of external-attacher.

AndyXiangLi · 2021-06-08T22:55:25Z

/test pull-aws-ebs-csi-driver-external-test

wongma7 · 2021-06-08T23:34:51Z

pkg/cloud/cloud.go

@@ -516,6 +516,9 @@ func (c *cloud) WaitForAttachmentState(ctx context.Context, volumeID, expectedSt
 			},
 		}

+		if ctx.Err() != nil && ctx.Err() == context.Canceled {


don't think we need the && to check of the err is canceled. if there's any err, exit

wongma7 · 2021-06-09T00:51:25Z

/lgtm
/approve
tyvm

k8s-ci-robot · 2021-06-09T00:51:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndyXiangLi, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AndyXiangLi,wongma7]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ialidzhikov · 2021-06-09T18:11:41Z

Thank you @AndyXiangLi and @wongma7 ! Is it possible to backport the fix to the affected versions that are still maintained/patched?

ialidzhikov · 2021-06-11T08:15:41Z

/kind bug

ialidzhikov · 2021-06-11T08:15:56Z

ping @AndyXiangLi @wongma7

wongma7 · 2021-06-11T18:42:20Z

yes we can do a 1.1.1, if not today then next week. There is also an xfs bugfix that I think deserves release

ialidzhikov · 2021-07-01T14:39:33Z

yes we can do a 1.1.1, if not today then next week. There is also an xfs bugfix that I think deserves release

@wongma7 @AndyXiangLi , is there any update wrt to v1.1.1? We are still looking forward to it :)

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 8, 2021

k8s-ci-robot requested review from d-nishi and ddebroy June 8, 2021 00:33

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 8, 2021

AndyXiangLi commented Jun 8, 2021

View reviewed changes

wongma7 reviewed Jun 8, 2021

View reviewed changes

AndyXiangLi force-pushed the create-volume-idempotent branch 3 times, most recently from 86e49de to d7bdf0a Compare June 8, 2021 19:07

AndyXiangLi closed this Jun 8, 2021

AndyXiangLi reopened this Jun 8, 2021

AndyXiangLi force-pushed the create-volume-idempotent branch from d7bdf0a to 38663cb Compare June 8, 2021 21:36

wongma7 reviewed Jun 8, 2021

View reviewed changes

AndyXiangLi force-pushed the create-volume-idempotent branch 2 times, most recently from 72b6c04 to 311f2f1 Compare June 8, 2021 23:43

update inFlight cache to avoid race condition on volume operation

383299e

AndyXiangLi force-pushed the create-volume-idempotent branch from 311f2f1 to 383299e Compare June 9, 2021 00:45

k8s-ci-robot assigned wongma7 Jun 9, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2021

wongma7 mentioned this pull request Jun 9, 2021

Multiple ControllerPublish/ControllerUnpublish calls may run concurrently #928

Closed

k8s-ci-robot merged commit 2a1c770 into kubernetes-sigs:master Jun 9, 2021

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 11, 2021

vpnachev mentioned this pull request Jun 13, 2021

Update aws-ebs-csi-driver gardener/gardener-extension-provider-aws#356

Merged

wongma7 mentioned this pull request Jul 1, 2021

release-1.1: Release 1.1.1 and chart v1.2.4 with inFlight cache race fix #958

Merged

bertinatto mentioned this pull request Aug 12, 2021

Restart of kubelet leads to duplicated mount entries #1007

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update inFlight cache to avoid race condition on volume operation #924

update inFlight cache to avoid race condition on volume operation #924

AndyXiangLi commented Jun 8, 2021

coveralls commented Jun 8, 2021 •

edited

Loading

AndyXiangLi Jun 8, 2021

AndyXiangLi Jun 8, 2021 •

edited

Loading

wongma7 Jun 8, 2021

wongma7 Jun 8, 2021

AndyXiangLi Jun 8, 2021

wongma7 Jun 8, 2021

wongma7 Jun 8, 2021

wongma7 Jun 8, 2021 •

edited

Loading

AndyXiangLi Jun 8, 2021

wongma7 Jun 8, 2021

wongma7 commented Jun 8, 2021

wongma7 commented Jun 8, 2021

AndyXiangLi commented Jun 8, 2021

wongma7 Jun 8, 2021

wongma7 commented Jun 9, 2021

k8s-ci-robot commented Jun 9, 2021

ialidzhikov commented Jun 9, 2021

ialidzhikov commented Jun 11, 2021

ialidzhikov commented Jun 11, 2021

wongma7 commented Jun 11, 2021

ialidzhikov commented Jul 1, 2021

update inFlight cache to avoid race condition on volume operation #924

update inFlight cache to avoid race condition on volume operation #924

Conversation

AndyXiangLi commented Jun 8, 2021

coveralls commented Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

AndyXiangLi Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wongma7 Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wongma7 commented Jun 8, 2021

wongma7 commented Jun 8, 2021

AndyXiangLi commented Jun 8, 2021

Choose a reason for hiding this comment

wongma7 commented Jun 9, 2021

k8s-ci-robot commented Jun 9, 2021

ialidzhikov commented Jun 9, 2021

ialidzhikov commented Jun 11, 2021

ialidzhikov commented Jun 11, 2021

wongma7 commented Jun 11, 2021

ialidzhikov commented Jul 1, 2021

coveralls commented Jun 8, 2021 •

edited

Loading

AndyXiangLi Jun 8, 2021 •

edited

Loading

wongma7 Jun 8, 2021 •

edited

Loading