-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lock per VolumeId during NodeUnstageVolume operation to avoid conflicts during volume unmount #2811
Use lock per VolumeId during NodeUnstageVolume operation to avoid conflicts during volume unmount #2811
Conversation
b881bee
to
882bf66
Compare
Started vanilla Block pipeline... Build Number: 2555 |
|
882bf66
to
f5d6fdf
Compare
f5d6fdf
to
28c0612
Compare
Started Vanilla block pre-checkin pipeline... Build Number: 2682 |
28c0612
to
2b0d801
Compare
2b0d801
to
edc50e9
Compare
/approve |
@chethanv28 Change looks good to me. Are you able to repro the issue locally and test it with your fix to make sure the issue we observed in SR is fixed? |
edc50e9
to
902088e
Compare
Started Vanilla block pre-checkin pipeline... Build Number: 2700 |
|
pkg/csi/service/node.go
Outdated
// NodeUnstageVolume operation. | ||
if acquired := driver.volumeLocks.TryAcquire(volumeID); !acquired { | ||
return nil, logger.LogNewErrorCodef(log, codes.Aborted, | ||
"An operation with the given Volume ID %s already exists", volumeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix message with method name: NodeUnstageVolume
some more minor comments. Change looks good to me. |
902088e
to
10ffe42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chethanv28, deepakkinni, divyenpatel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/lgtm |
Hello all, my team submitted the case with VMware support which led to this PR (thank you!). For our tracking purposes related to the bug this addresses, would you mind letting me know what release version should include this change? |
I see it was included in v3.3.0, thanks! |
What this PR does / why we need it:
PR is adding usage of locking mechanism per VolumeId during NodeUnstageVolume operation.
There have few observations where the 1st NodeUnstageVolume call takes more time and is till going on. Meanwhile, k8s will issue a 2nd NodeUnstageVolume call assuming the 1st NodeUnstageVolume has timed out. The 2nd call succeeds as the target Mountpoint is not found. Therefore, a DetachVolume will be invoked while the 1st NodeUnstageVolume is still in-progress and in-turn corrupts the volume. To avoid the above issue, we can keep a lock per VolumeID during the NodeUnstageVolume operation.
A similar locking mechanism is applied to NodePublish, NodeUnPublish, NodeStage operations as well
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Testing done:
Running e2e pipeline
Release note: