Skip to content

Commit 17097f4

Browse files
author
Deepak Kinni
committed
Honor Persistent Volume Reclaim Policy
Signed-off-by: Deepak Kinni <dkinni@vmware.com>
1 parent 8b8aa9c commit 17097f4

File tree

3 files changed

+293
-0
lines changed

3 files changed

+293
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2644
2+
alpha:
3+
approver: "@ehashman"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# KEP-2644: Honor Persistent Volume Reclaim Policy
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Proposal](#proposal)
8+
- [Risks and Mitigations](#risks-and-mitigations)
9+
- [Design Details](#design-details)
10+
- [Test Plan](#test-plan)
11+
- [E2E tests](#e2e-tests)
12+
- [Graduation Criteria](#graduation-criteria)
13+
- [Alpha](#alpha)
14+
- [Alpha -&gt; Beta](#alpha---beta)
15+
- [Beta -&gt; GA](#beta---ga)
16+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
17+
- [Version Skew Strategy](#version-skew-strategy)
18+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
19+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
20+
- [Scalability](#scalability)
21+
- [Implementation History](#implementation-history)
22+
- [Drawbacks](#drawbacks)
23+
- [Alternatives](#alternatives)
24+
<!-- /toc -->
25+
26+
## Release Signoff Checklist
27+
28+
Items marked with (R) are required *prior to targeting to a milestone / release*.
29+
30+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
31+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
32+
- [ ] (R) Design details are appropriately documented
33+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
34+
- [ ] e2e Tests for all Beta API Operations (endpoints)
35+
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
36+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
37+
- [ ] (R) Graduation criteria is in place
38+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
39+
- [ ] (R) Production readiness review completed
40+
- [ ] (R) Production readiness review approved
41+
- [ ] "Implementation History" section is up-to-date for milestone
42+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
43+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
44+
45+
[kubernetes.io]: https://kubernetes.io/
46+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
47+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
48+
[kubernetes/website]: https://git.k8s.io/website
49+
50+
## Summary
51+
Reclaim policy associated with the Persistent Volume is currently ignored under certain circumstance. For a `Bound`
52+
PV-PVC pair the ordering of PV-PVC deletion determines whether the PV delete reclaim policy is honored. The PV honors
53+
the reclaim policy if the PVC is deleted prior to deleting the PV, however, if the PV is deleted prior to deleting the
54+
PVC then the Reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the
55+
external infrastructure is not removed.
56+
57+
## Motivation
58+
Prevent volumes from being leaked by honoring the PV Reclaim policy after the `Bound` PVC is also deleted.
59+
60+
## Proposal
61+
To better understand the existing issue we will initially walk through existing behavior when a Bound PV is deleted prior
62+
to deleting the PVC.
63+
64+
65+
```
66+
kubectl delete pv <pv-name>
67+
```
68+
On deleting a `Bound` PV, a `deletionTimestamp` is added on to the PV object, this triggers an update which is consumed
69+
by the PV-PVC-controller. The PV-PVC-controller observes that the PV is still associated with a PVC, and the PVC hasn't
70+
been deleted, as a result of this, the PV-PVC-controller attempts to update the PV phase to `Bound` state. Assuming that
71+
the PV was already bound previously, this update eventually amounts to a no-op.
72+
73+
The PVC is deleted at a later point of time.
74+
```
75+
kubectl -n <namespace> delete pvc <pvc-name>
76+
```
77+
78+
1. A `deletionTimestamp` is added on the PVC object.
79+
2. The PVC-protection-controller picks the update, verifies if `deletionTimestamp` is present and there are no pods that
80+
are currently using the PVC.
81+
3. If there are no Pods using the PVC, the PVC finalizers are removed, eventually triggering a PVC delete event.
82+
4. The PV-PVC-controller processes delete PVC event, leading to removal of the PVC from it's in-memory cache followed
83+
by triggering an explicit sync on the PV.
84+
5. The PV-PVC-controller processes the triggered explicit sync, here, it observes that the PVC is not longer available
85+
and updates the PV phase to `Released` state.
86+
6. If the PVC-protection-controller picks the update, it observes that there is a `deletionTimestamp` and the PV is not
87+
in a `Bound` phase, this causes the finalizers to be removed.
88+
7. This is followed by PV-PVC-controller initiating the reclaim volume workflows.
89+
8. The reclaim volume workflow observes the `persistentVolumeReclaimPolicy` as `Delete` and schedules a volume deletion.
90+
9. During delete volume operation, it is observed that there is already a `deletionTimestamp`, and the call is never
91+
forwarded to the CSI driver to delete the underlying volume.
92+
93+
As observed from the description above, (9) prevents the volume deletion to be forwarded to the plugin.
94+
95+
Preventing the check should allow the volume deletion.
96+
97+
98+
### Risks and Mitigations
99+
100+
The primary risk is volume deletion for user that expect the current behavior, i.e, they do not expect the volume to be
101+
deleted from the underlying storage infrastructure when for a bound PV-PVC pair, the PV is deleted followed by deleting
102+
the PVC. This can be mitigated by initially introducing the fix behind a feature gate, followed by explicitly deprecating
103+
the current behavior.
104+
105+
## Design Details
106+
107+
The below code is called when `persistentVolumeReclaimPolicy` is `Delete`, as a part of the proposed fix, removing the
108+
`deletionTimestamp` check would allow the volume deletion through the plugin.
109+
110+
```go
111+
// deleteVolumeOperation deletes a volume. This method is running in standalone
112+
// goroutine and already has all necessary locks.
113+
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
114+
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
115+
116+
// This method may have been waiting for a volume lock for some time.
117+
// Previous deleteVolumeOperation might just have saved an updated version, so
118+
// read current volume state now.
119+
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{})
120+
if err != nil {
121+
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
122+
return "", nil
123+
}
124+
125+
if newVolume.GetDeletionTimestamp() != nil {//<==========================Remove check.
126+
klog.V(3).Infof("Volume %q is already being deleted", volume.Name)
127+
return "", nil
128+
}
129+
needsReclaim, err := ctrl.isVolumeReleased(newVolume)
130+
if err != nil {
131+
klog.V(3).Infof("error reading claim for volume %q: %v", volume.Name, err)
132+
return "", nil
133+
}
134+
if !needsReclaim {
135+
klog.V(3).Infof("volume %q no longer needs deletion, skipping", volume.Name)
136+
return "", nil
137+
}
138+
139+
pluginName, deleted, err := ctrl.doDeleteVolume(volume)
140+
// Remaining code below skipped for brevity...
141+
// Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code.
142+
143+
}
144+
```
145+
146+
The proposed fix will not have side effects in existing code flows as the `deleteVolumeOperation` is scheduled with an
147+
identifier generated using the volume name and UID that is being deleted. All duplicate identifiers are skipped.
148+
149+
```go
150+
case v1.PersistentVolumeReclaimDelete:
151+
klog.V(4).Infof("reclaimVolume[%s]: policy is Delete", volume.Name)
152+
opName := fmt.Sprintf("delete-%s[%s]", volume.Name, string(volume.UID))
153+
// create a start timestamp entry in cache for deletion operation if no one exists with
154+
// key = volume.Name, pluginName = provisionerName, operation = "delete"
155+
ctrl.operationTimestamps.AddIfNotExist(volume.Name, ctrl.getProvisionerNameFromVolume(volume), "delete")
156+
ctrl.scheduleOperation(opName, func() error {
157+
_, err := ctrl.deleteVolumeOperation(volume)
158+
if err != nil {
159+
// only report error count to "volume_operation_total_errors"
160+
// latency reporting will happen when the volume get finally
161+
// deleted and a volume deleted event is captured
162+
metrics.RecordMetric(volume.Name, &ctrl.operationTimestamps, err)
163+
}
164+
return err
165+
})
166+
```
167+
168+
### Test Plan
169+
170+
#### E2E tests
171+
An e2e test to exercise deletion of PV prior to deletion of PVC for a `Bound` PV-PVC pair.
172+
173+
174+
### Graduation Criteria
175+
176+
#### Alpha
177+
* Feedback
178+
* Fix of the issue behind a feature flag.
179+
* Unit tests and e2e tests outlined in design proposal implemented.
180+
181+
#### Alpha -> Beta
182+
* Allow the Alpha fix to soak for one release.
183+
184+
#### Beta -> GA
185+
* Deployed in production and have gone through at least one K8s upgrade.
186+
187+
### Upgrade / Downgrade Strategy
188+
* Upgrade from old Kubernetes(1.21) to new Kubernetes(1.22) with `HonorPvReclaim` flag enabled
189+
190+
In the above scenario, the upgrade will cause change in default behavior as described in the current KEP. Additionally,
191+
no efforts are made to delete volumes that were not previously as a result of old behavior.
192+
193+
* Downgrade from new Kubernetes(1.22) to old Kubernetes(1.21).
194+
195+
In the above scenario, downgrade will retain the current behavior.
196+
197+
### Version Skew Strategy
198+
No Issues, the fix is present on a single component `kube-controller-manager`.
199+
200+
## Production Readiness Review Questionnaire
201+
202+
### Feature Enablement and Rollback
203+
204+
###### How can this feature be enabled / disabled in a live cluster?
205+
206+
- [x] Feature gate (also fill in values in `kep.yaml`)
207+
- Feature gate name: `HonorPvReclaim`
208+
- Components depending on the feature gate: kube-controller-manager
209+
210+
###### Does enabling the feature change any default behavior?
211+
212+
Enabling the feature will delete the volume from underlying storage when the PV is deleted followed deleting the PVC for
213+
a bound PV-PVC pair where the PV reclaim policy is delete.
214+
215+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
216+
Yes. Disabling the feature flag will continue previous behavior.
217+
218+
###### What happens if we reenable the feature if it was previously rolled back?
219+
Will pick the new behavior as described before.
220+
221+
###### Are there any tests for feature enablement/disablement?
222+
223+
There will be unit tests for the feature `HonorPvReclaim` enablement/disablement.
224+
225+
### Scalability
226+
227+
###### Will enabling / using this feature result in any new API calls?
228+
No.
229+
230+
###### Will enabling / using this feature result in introducing new API types?
231+
No.
232+
233+
###### Will enabling / using this feature result in any new calls to the cloud provider?
234+
Yes, previously the delete volume call would not reach the provider under the described circumstances, with this change
235+
the volume delete will be invoked to remove the volume from the underlying storage.
236+
237+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
238+
No.
239+
240+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
241+
No.
242+
243+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
244+
No.
245+
246+
## Implementation History
247+
248+
1.22 : Alpha
249+
250+
## Drawbacks
251+
None. The current behavior could be considered as a drawback, the KEP presents the fix to the drawback. The current
252+
behavior was reported in [Issue-546](https://github.com/kubernetes-csi/external-provisioner/issues/546) and [Issue-195](https://github.com/kubernetes-csi/external-provisioner/issues/195)
253+
254+
## Alternatives
255+
Other approaches to fix have not been considered and will be considered if presented during the KEP review.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
title: Honor Persistent Volume Reclaim Policy
2+
kep-number: 2644
3+
authors:
4+
- "@deepakkinni"
5+
owning-sig: sig-storage
6+
participating-sigs:
7+
status: implementable
8+
creation-date: 2021-05-05
9+
reviewers:
10+
- "@xing-yang"
11+
- "@jsafrane"
12+
approvers:
13+
- "@jsafrane"
14+
- "@xing-yang"
15+
prr-approvers:
16+
- "@ehashman"
17+
see-also:
18+
replaces:
19+
20+
stage: alpha
21+
22+
latest-milestone: "v1.22"
23+
24+
milestone:
25+
alpha: "v1.22"
26+
beta: "v1.23"
27+
stable: "v1.24"
28+
29+
feature-gates:
30+
- name: HonorPvReclaim
31+
components:
32+
- kube-controller-manager
33+
disable-supported: true
34+
35+
metrics:

0 commit comments

Comments
 (0)