From 9393e1ac19a6691968835368a0ef7957011250de Mon Sep 17 00:00:00 2001
From: NickrenREN <renyuquan@caicloud.io>
Date: Mon, 11 Dec 2017 12:38:50 +0800
Subject: [PATCH] pv monitoring proposal

---
 .../storage/pv-monitoring-proposal.md         | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100644 contributors/design-proposals/storage/pv-monitoring-proposal.md

diff --git a/contributors/design-proposals/storage/pv-monitoring-proposal.md b/contributors/design-proposals/storage/pv-monitoring-proposal.md
new file mode 100644
index 00000000000..4a2f4141170
--- /dev/null
+++ b/contributors/design-proposals/storage/pv-monitoring-proposal.md
@@ -0,0 +1,128 @@
+# PV monitoring proposal
+
+Status: Pending
+
+Version: Alpha
+
+Implementation Owner: NickrenREN@ 
+
+## Motivation
+
+For now, kubernetes has no way to monitor the PVs, which may cause serious problems. 
+For example: if volumes are unhealthy, and pods do not know that and still try to get and write data, 
+which will lead to data loss and unavailability of services. 
+So it is necessary to have a mechanism for monitoring PVs and react when PVs have problems.
+
+## Proposal
+
+We can separate the proposal into two parts:
+
+* monitoring PVs and marking them if they have problems
+* reacting to the unhealthy PVs
+
+For monitoring, we may create a controller for it, and each volume plugin should have its own function to check volume health. 
+Controller can call them periodically. The controller also needs to watch node events because local PVs will be unreachable if nodes break down.
+
+For reacting, different kinds of apps may have different methods,we can also create a controller for it. 
+
+At first phase, we may just react to statefulsets with local storage PVs.
+
+## User Experience
+### Use Cases
+
+* Users create many local PVs manually and want to monitor their health condition;
+* Users want to know if the volumes their apps are referencing are healthy or not;
+* If node breaks down, we need to reschedule pods referencing that node's local PVs (the application has data backup and can restore it or can tolerate data loss);
+
+## Implementation
+
+As mentioned above, we can split this into two parts and put them in the external repo at first.
+
+### Monitoring controller: 
+
+Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy.
+Health checking implementation should be per plugin. Each volume plugin needs to have its own methods to check its volumes.
+
+#### For in-tree volume plugins(except local storage):
+
+We can add a new volume plugin interface: StatusCheckingVolumePlugin.
+```
+  type StatusCheckingVolumePlugin interface {
+       VolumePlugin
+
+      CheckStatus(spec *Spec) (string, error)
+  }
+```
+And each volume plugin will implement it. The entire monitoring controller workflow is:
+
+* Fill PV cache with initial data from etcd
+* Resync and check volumes status periodically
+* Taint PV if the volume status is abnormal
+* Remove the taints if the volume becomes normal again
+
+#### For out-tree volume plugins(except local storage):
+
+We can create a new controller called MonitorController like ProvisionController, 
+which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions. 
+And each volume plugin will create its own monitor to check its volumes’ status.
+
+#### For local storage:
+
+The local storage PV monitor consists of two parts
+
+* create a daemonset on every node, which is responsible for monitoring local PVs in that specific node, no matter the PVs are created manually or by provisioner;
+* create a monitor controller, which is responsible for watching PVs and Nodes events. PVs may be updated if they are unhealthy and we also need to react to node failure event.
+
+At the first phase, we can support local storage monitoring first.
+
+Take local storage as an example, detailed monitoring implementation may be like this:
+
+```
+ func (mon *minitor) CheckStatus(spec *v1.PersistentVolumeSpec) (string, error) {
+        // check if PV is local storage
+        if pv.Spec.Local == nil { 
+                glog.Infof("PV: %s is not local storage", pv.Name)
+                return ...
+        }
+        
+        // check if PV belongs to this node
+        if checkNodeAffinity(...) {
+                glog.Infof("PV: %s does not belong to this node", pv.Name)
+                return ...
+        }
+
+        // check volume health condition depending on device type
+        ...
+
+}
+```
+If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp. 
+The reaction controller then can react to this PV depending on the annotations and timestamp.
+
+### Reaction controller:
+
+Reaction controller will react to the PV update event (tainting PV by monitoring controller). 
+Different kinds of apps should have different reactions.
+
+At the first phase, we just consider the statefulset reaction.
+
+statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, we will do nothing, 
+otherwise we need to delete the PVC bound to the unhealthy volume(PV) as well as pods referencing it.
+
+Reaction controller’s workflow is:
+
+* Fill PV cache from etcd;
+* Watch for PV update events;
+* Resync and populate periodically;
+* Delete related PVC and pods if needed (just for statefulsets and reclaim PV depending on reclaim policy);
+
+### PV controller changes:
+For unbound PVCs/PVs,  PVCs will not be bound to PVs which have taints.
+
+## Roadmap to support PV monitoring
+
+* support local storage PV monitoring(mark PVs), and just react to statefulsets
+* support other out-tree volume plugins and add PV taint support
+* support in-tree volume plugins and react to other kinds of applications if needed
+
+## Alternatives considered