-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update inFlight cache to avoid race condition on volume operation #924
update inFlight cache to avoid race condition on volume operation #924
Conversation
pkg/driver/controller.go
Outdated
@@ -307,6 +314,14 @@ func (d *controllerService) ControllerPublishVolume(ctx context.Context, req *cs | |||
return nil, status.Error(codes.InvalidArgument, errString) | |||
} | |||
|
|||
// check if a request is already in-flight, VolumeId should be enough at this moment because Multi node writer mode is not support in the driver | |||
// However, use req hash as key because eventually we need to support the ability to publish the same volume onto different nodes concurrently, and I don't see any harm here. | |||
if ok := d.inFlight.Insert(req.String()); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we only support single writer mode now, I would say it is ok to just use volumeId as key here. However, I don't see any harm if we using combination of volumeId&nodeId, If customer is trying to attach same volume to different node now, this request will be reject in other steps. Advise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this block in ControllerPublishVolume and ControllerUnpublishVolume, e2e test is failing because of this. I think we need to dig deeper into the publish/unpublish volume scenario. It's not related to the issue though, how about we implement this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which test? I agree, let's leave ControllerPublish/Attach for another PR, it's more complicated for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec says: This operation MUST be idempotent. If the volume corresponding to the volume_id has already been published at the node corresponding to the node_id, and is compatible with the specified volume_capability and readonly flag, the Plugin MUST reply 0 OK.
So you are right, we should use volume_id and node_id.
If the attacher requests the same volume_id and different node_id (and volume is not multiattach, which we don't support yet anyway) the request will fail/get rejected anyway, I agree that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like detach just took too long? or somehow the lock/inflight didn't get released, although the code is simple so I don't know how that would be possible.
I0608 00:51:24.912672 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached
I0608 00:51:26.006673 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached
I0608 00:51:27.918180 1 cloud.go:594] Waiting for volume "vol-059485ebcfd5f7282" state: actual=detaching, desired=detached
W0608 00:53:00.254490 1 cloud.go:535] Ignoring error from describe volume for volume "vol-0944244f08b5badd0"; will retry: "RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
E0608 00:53:28.688453 1 driver.go:119] GRPC error: rpc error: code = Aborted desc = ControllerUnpublishVolume for Volume vol-0944244f08b5badd0 and Node i-071ebf53811520106 is already in progress
E0608 00:54:18.254315 1 driver.go:119] GRPC error: rpc error: code = Aborted desc = ControllerUnpublishVolume for Volume vol-0944244f08b5badd0 and Node i-071ebf53811520106 is already in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's possible for attach/detach to deadlock since in kubernetes world it's quite common/possible to have 1 replica Terminating on one Node and its vol trying to detach at the same time as 1 replica Pending on another Node and its vol trying to attach. This would mean we would need to index by ControllerPublish+volume_id+node_id or ontrollerUnpublish+volume_id+node_id
pkg/driver/controller.go
Outdated
@@ -121,6 +121,13 @@ func (d *controllerService) CreateVolume(ctx context.Context, req *csi.CreateVol | |||
return nil, status.Error(codes.InvalidArgument, errString) | |||
} | |||
|
|||
// check if a request is already in-flight, move this block to the beginning of the request to protect idempotency of this API | |||
if ok := d.inFlight.Insert(req.String()); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we move this all the way to beginning of the func, even before validating the request?
Also I noticed it's inconsistent how some functions index * with volumeID but others with request...I think they should all use request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look into the spec, it is saying there is no more than one call “in-flight” per volume at a given time. I was struggled to determine using volumeId or whole req. If we use req, we are able to guarantee no more than one request per operation for one volume, but some case like deleteVolume & createSnapshot on same VolumeId will be allowed. Just wondering should we care about these use cases, otherwise use whole request make sense to me..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's consult the spec and think about it rigorously for each operation, I am scared of races :D
CreateVolume: We should use name
: https://github.com/container-storage-interface/spec/blob/master/spec.md#controller-service-rpc
DeleteVolume: We should use volume_id
: https://github.com/container-storage-interface/spec/blob/master/spec.md#deletevolume
CreateSnapshot: We should use name
: https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot
DeleteSnapshot: We should use snapshot_id
: https://github.com/container-storage-interface/spec/blob/master/spec.md#deletesnapshot
obviously, for createvolume and createsnapshot, you don't know the ID yet, so you can't use it as an idempotency token, that's why the spec requires name
.
86e49de
to
d7bdf0a
Compare
for ControllerUnpublish, we get stuck in this loop aws-ebs-csi-driver/pkg/cloud/cloud.go Line 499 in 2b61230
aws-ebs-csi-driver/pkg/cloud/cloud.go Line 805 in 2b61230
I think the fix is to check ctx.Err() with every loop iteration, if it has been cancelled we should exit the loop. External-attacher will then call ControllerUnpublish again and start another loop |
The alternative is to increase external-attacher context timeout to be much higher, like 45 minutes aws-ebs-csi-driver/pkg/cloud/cloud.go Line 500 in 2b61230
In that case, our own retry logic will dictate the frequency we call DescribeVolumes instead of external-attacher. |
d7bdf0a
to
38663cb
Compare
/test pull-aws-ebs-csi-driver-external-test |
pkg/cloud/cloud.go
Outdated
@@ -516,6 +516,9 @@ func (c *cloud) WaitForAttachmentState(ctx context.Context, volumeID, expectedSt | |||
}, | |||
} | |||
|
|||
if ctx.Err() != nil && ctx.Err() == context.Canceled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we need the && to check of the err is canceled. if there's any err, exit
72b6c04
to
311f2f1
Compare
311f2f1
to
383299e
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AndyXiangLi, wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you @AndyXiangLi and @wongma7 ! Is it possible to backport the fix to the affected versions that are still maintained/patched? |
/kind bug |
ping @AndyXiangLi @wongma7 |
yes we can do a 1.1.1, if not today then next week. There is also an xfs bugfix that I think deserves release |
@wongma7 @AndyXiangLi , is there any update wrt to v1.1.1? We are still looking forward to it :) |
Is this a bug fix or adding new feature?
Fixes #918
What is this PR about? / Why do we need it?
Per spec, there is no more than one call “in-flight” per volume at a given time.
Add another layer of protection on the driver to make sure apis are idempotent.
What testing is done?
e2e test on k8s 1.19