From 9393e1ac19a6691968835368a0ef7957011250de Mon Sep 17 00:00:00 2001 From: NickrenREN Date: Mon, 11 Dec 2017 12:38:50 +0800 Subject: [PATCH] pv monitoring proposal --- .../storage/pv-monitoring-proposal.md | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 contributors/design-proposals/storage/pv-monitoring-proposal.md diff --git a/contributors/design-proposals/storage/pv-monitoring-proposal.md b/contributors/design-proposals/storage/pv-monitoring-proposal.md new file mode 100644 index 00000000000..4a2f4141170 --- /dev/null +++ b/contributors/design-proposals/storage/pv-monitoring-proposal.md @@ -0,0 +1,128 @@ +# PV monitoring proposal + +Status: Pending + +Version: Alpha + +Implementation Owner: NickrenREN@ + +## Motivation + +For now, kubernetes has no way to monitor the PVs, which may cause serious problems. +For example: if volumes are unhealthy, and pods do not know that and still try to get and write data, +which will lead to data loss and unavailability of services. +So it is necessary to have a mechanism for monitoring PVs and react when PVs have problems. + +## Proposal + +We can separate the proposal into two parts: + +* monitoring PVs and marking them if they have problems +* reacting to the unhealthy PVs + +For monitoring, we may create a controller for it, and each volume plugin should have its own function to check volume health. +Controller can call them periodically. The controller also needs to watch node events because local PVs will be unreachable if nodes break down. + +For reacting, different kinds of apps may have different methods,we can also create a controller for it. + +At first phase, we may just react to statefulsets with local storage PVs. + +## User Experience +### Use Cases + +* Users create many local PVs manually and want to monitor their health condition; +* Users want to know if the volumes their apps are referencing are healthy or not; +* If node breaks down, we need to reschedule pods referencing that node's local PVs (the application has data backup and can restore it or can tolerate data loss); + +## Implementation + +As mentioned above, we can split this into two parts and put them in the external repo at first. + +### Monitoring controller: + +Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy. +Health checking implementation should be per plugin. Each volume plugin needs to have its own methods to check its volumes. + +#### For in-tree volume plugins(except local storage): + +We can add a new volume plugin interface: StatusCheckingVolumePlugin. +``` + type StatusCheckingVolumePlugin interface { + VolumePlugin + + CheckStatus(spec *Spec) (string, error) + } +``` +And each volume plugin will implement it. The entire monitoring controller workflow is: + +* Fill PV cache with initial data from etcd +* Resync and check volumes status periodically +* Taint PV if the volume status is abnormal +* Remove the taints if the volume becomes normal again + +#### For out-tree volume plugins(except local storage): + +We can create a new controller called MonitorController like ProvisionController, +which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions. +And each volume plugin will create its own monitor to check its volumes’ status. + +#### For local storage: + +The local storage PV monitor consists of two parts + +* create a daemonset on every node, which is responsible for monitoring local PVs in that specific node, no matter the PVs are created manually or by provisioner; +* create a monitor controller, which is responsible for watching PVs and Nodes events. PVs may be updated if they are unhealthy and we also need to react to node failure event. + +At the first phase, we can support local storage monitoring first. + +Take local storage as an example, detailed monitoring implementation may be like this: + +``` + func (mon *minitor) CheckStatus(spec *v1.PersistentVolumeSpec) (string, error) { + // check if PV is local storage + if pv.Spec.Local == nil { + glog.Infof("PV: %s is not local storage", pv.Name) + return ... + } + + // check if PV belongs to this node + if checkNodeAffinity(...) { + glog.Infof("PV: %s does not belong to this node", pv.Name) + return ... + } + + // check volume health condition depending on device type + ... + +} +``` +If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp. +The reaction controller then can react to this PV depending on the annotations and timestamp. + +### Reaction controller: + +Reaction controller will react to the PV update event (tainting PV by monitoring controller). +Different kinds of apps should have different reactions. + +At the first phase, we just consider the statefulset reaction. + +statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, we will do nothing, +otherwise we need to delete the PVC bound to the unhealthy volume(PV) as well as pods referencing it. + +Reaction controller’s workflow is: + +* Fill PV cache from etcd; +* Watch for PV update events; +* Resync and populate periodically; +* Delete related PVC and pods if needed (just for statefulsets and reclaim PV depending on reclaim policy); + +### PV controller changes: +For unbound PVCs/PVs, PVCs will not be bound to PVs which have taints. + +## Roadmap to support PV monitoring + +* support local storage PV monitoring(mark PVs), and just react to statefulsets +* support other out-tree volume plugins and add PV taint support +* support in-tree volume plugins and react to other kinds of applications if needed + +## Alternatives considered