-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PV monitoring proposal #1484
PV monitoring proposal #1484
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
# PV monitoring proposal | ||
|
||
Status: Pending | ||
|
||
Version: Alpha | ||
|
||
Implementation Owner: NickrenREN@ | ||
|
||
## Motivation | ||
|
||
For now, kubernetes has no way to monitor the PVs, which may cause serious problems. | ||
For example: if volumes are unhealthy, and pods do not know that and still try to get and write data, | ||
which will lead to data loss and unavailability of services. | ||
So it is necessary to have a mechanism for monitoring PVs and react when PVs have problems. | ||
|
||
## Proposal | ||
|
||
We can separate the proposal into two parts: | ||
|
||
* monitoring PVs and marking them if they have problems | ||
* reacting to the unhealthy PVs | ||
|
||
For monitoring, we may create a controller for it, and each volume plugin should have its own function to check volume health. | ||
Controller can call them periodically. The controller also needs to watch node events because local PVs will be unreachable if nodes break down. | ||
|
||
For reacting, different kinds of apps may have different methods,we can also create a controller for it. | ||
|
||
At first phase, we can focus on local storage PVs monitoring. | ||
|
||
## User Experience | ||
### Use Cases | ||
|
||
* If the local PV path is deleted, users should know that and the local PV should be marked and deleted; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like there's two different reaction controllers being proposed here?
Could you try to clarify this in the doc and separate out which behaviors will be handled by which controllers? I think one of the confusing things here is there's multiple levels/layers of reaction and it's not clear how they will all interact with each other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe this design should just focus on the local PV monitoring part. And we can leave StatefulSet handling to a different design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, reaction needs more discussion, so i will move it to next stage. |
||
* If the local PV path is not a mountpoint any more, the local PV should be marked and deleted; | ||
* If nodes which have local PVs are breaking down, the local PVs should be marked and deleted (the application has data backup and can restore it or can tolerate data loss and the PV protection feature may help); | ||
* For local PVs, we need to make sure that PV capacity must not be greater than device capacity and PV used bytes must not be greater than PV capacity; | ||
* For network storage, if the storage driver volume is deleted, the PV object in kubernetes should be marked and deleted too; | ||
* If we can not get access to the PV volume for a certain time (network or some other problems), we need to mark and delete the PV; | ||
* PV fsType checking ? bad blocks checking ? | ||
|
||
## Implementation | ||
|
||
As mentioned above, we can split this into two parts and put them in the external repo at first. | ||
|
||
### Monitoring controller: | ||
|
||
Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens when PVs are tainted? Currently we do not have ability to taint PVs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, at the first phase, do not need to change PV struct, we will mark PV by adding annotations instead. |
||
|
||
Health checking implementation should be per plugin. Each volume plugin needs to have its own methods to check its volumes. | ||
|
||
At the first stage, we can focus on local storage PVs, and then extend to other network storage PVs. | ||
#### For local storage: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be very interested in seeing an outline for monitoring node failures, which is a failure mode that local volumes are especially prone to compared to other volume types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, will do |
||
|
||
The local storage PV monitor consists of two parts | ||
|
||
* create a daemonset on every node, which is responsible for monitoring local PVs in that specific node, no matter the PVs are created manually or by provisioner; | ||
* create a monitor controller, which is responsible for watching PVs and Nodes events. PVs may be updated if they are unhealthy and we also need to react to node failure event. | ||
|
||
At the first phase, we can support local storage monitoring first. | ||
|
||
Take local storage as an example, detailed checking method may be like this: | ||
|
||
``` | ||
// checkStatus checks local pv health condition | ||
func (monitor *LocalPVMonitor) checkStatus(pv *v1.PersistentVolume) { | ||
// check if PV is local storage | ||
if pv.Spec.Local == nil { | ||
glog.Infof("PV: %s is not local storage", pv.Name) | ||
return | ||
} | ||
// check node and pv affinity | ||
fit, err := CheckNodeAffinity(pv, monitor.Node.Labels) | ||
if err != nil { | ||
glog.Errorf("check node affinity error: %v", err) | ||
return | ||
} | ||
if !fit { | ||
glog.Errorf("pv: %s does not belong to this node: %s", pv.Name, monitor.Node.Name) | ||
return | ||
} | ||
// check if host dir still exists | ||
mountPath, continueThisCheck := monitor.checkHostDir(pv) | ||
if !continueThisCheck { | ||
glog.Errorf("Host dir is modified, PV should be marked") | ||
return | ||
} | ||
// check if it is still a mount point | ||
continueThisCheck = monitor.checkMountPoint(mountPath, pv) | ||
if !continueThisCheck { | ||
glog.Errorf("Retrieving mount points error or %s is not a mount point any more", mountPath) | ||
return | ||
} | ||
// check PV size: PV capacity must not be greater than device capacity and PV used bytes must not be greater that PV capacity | ||
if pv.Spec.VolumeMode != nil && *pv.Spec.VolumeMode == v1.PersistentVolumeBlock { | ||
monitor.checkPVAndBlockSize(mountPath, pv) | ||
} else { | ||
monitor.checkPVAndFSSize(mountPath, pv) | ||
} | ||
// other checks ... | ||
} | ||
``` | ||
If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you define the annotation key/value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, sure, defined in demo PR: kubernetes-retired/external-storage#528 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you define the annotation format and syntax in this design doc, and also put some of the examples here too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
The reaction controller then can react to this PV depending on the annotations and timestamp. | ||
|
||
When we first mark a PV, we will add another annotation which key is `FirstMarkTime`. | ||
And if local PV is unhealthy, annotation keys may be like: `HostPathNotExist`, `MisMatchedVolSize`, and `NotMountPoint`... | ||
|
||
A marked local PV looks like: | ||
``` | ||
Name: example-local-pv-1 | ||
Labels: <none> | ||
Annotations: FirstMarkTime=2018-04-17 07:31:02.388570492 +0000 UTC m=+600.033905921 | ||
HostPathNotExist=yes | ||
NotMountPoint=yes | ||
volume.alpha.kubernetes.io/node-affinity={ "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "kubernetes.io/hostname", "operator": "In", "valu... | ||
Finalizers: [kubernetes.io/pv-protection] | ||
StorageClass: local-disks | ||
Status: Available | ||
Claim: | ||
Reclaim Policy: Retain | ||
Access Modes: RWO | ||
Capacity: 200Mi | ||
Node Affinity: <none> | ||
Message: | ||
Source: | ||
Type: LocalVolume (a persistent volume backed by local storage on a node) | ||
Path: /mnt/disks/vol/vol1 | ||
Events: | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
Normal MarkPVSucceeded 1m local-volume-monitor-127.0.0.1-40a8fb4d-4206-11e8-8e52-080027765304 Mark PV successfully with annotation key: NotMountPoint | ||
Normal MarkPVSucceeded 22s local-volume-monitor-127.0.0.1-40a8fb4d-4206-11e8-8e52-080027765304 Mark PV successfully with annotation key: HostPathNotExist | ||
``` | ||
|
||
#### For out-tree volume plugins(except local storage): | ||
|
||
We can implement the monitor at external-repo at first. So for networked storage monitor, | ||
we can create a new controller called MonitorController like ProvisionController, | ||
which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions and watch . | ||
And each volume plugin will create its own monitor to check its volumes’ status. | ||
|
||
#### For in-tree volume plugins(except local storage): | ||
|
||
We can add a new volume plugin interface: PVHealthCheckingVolumePlugin. | ||
``` | ||
type PVHealthCheckingVolumePlugin interface { | ||
VolumePlugin | ||
CheckHealthCondition(spec *Spec) (string, error) | ||
} | ||
``` | ||
And each volume plugin will implement it. The entire monitoring controller workflow is: | ||
|
||
* Fill PV cache with initial data from etcd | ||
* Resync and check volumes status periodically | ||
* Taint PV if the volume status is abnormal | ||
|
||
### PV controller changes: | ||
For unbound PVCs/PVs, PVCs will not be bound to PVs which have taints. | ||
|
||
### Reaction controller: | ||
Reaction part can be implemented at the second stage, and can focus on statefulset reaction at first. | ||
Reaction controller will react to the PV update event (PVs tainted/marked by monitoring controller). | ||
Different kinds of apps should have different reactions. | ||
|
||
statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, | ||
we will do nothing, otherwise we need to delete the PVC bound to the unhealthy volume(PV) as well as pods referencing it. | ||
Notice the statefulset apps must have data backup and can restore it or can tolerate data loss. | ||
The PV protection feature may help. | ||
|
||
Reaction controller’s workflow is: | ||
|
||
* Fill PV cache from etcd; | ||
* Watch for PV update events; | ||
* Resync and populate periodically; | ||
* Delete related PVC and pods if needed ; | ||
|
||
|
||
|
||
## Roadmap to support PV monitoring | ||
|
||
* support local storage PV monitoring(marking PVs); | ||
* out-tree networked volume plugins monitor and statefulset reaction and add PV taint API support; | ||
* support in-tree volume plugins and react to other kinds of applications if needed. | ||
|
||
## Alternatives considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly will it monitor btw? Are you talking about bad blocks or fsck? Please add some examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will monitor if volumes still exist or if their status are ok.
e.g.
For local PV: If its Path( directory) is deleted by mistake or node breaks down... this will cause data loss, kubernetes needs to know that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to explicitly write down some specific error conditions somewhere.