Skip to content

Commit

Permalink
pv monitoring proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
NickrenREN committed Dec 20, 2017
1 parent eb410ff commit 9393e1a
Showing 1 changed file with 128 additions and 0 deletions.
128 changes: 128 additions & 0 deletions contributors/design-proposals/storage/pv-monitoring-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# PV monitoring proposal

Status: Pending

Version: Alpha

Implementation Owner: NickrenREN@

## Motivation

For now, kubernetes has no way to monitor the PVs, which may cause serious problems.
For example: if volumes are unhealthy, and pods do not know that and still try to get and write data,
which will lead to data loss and unavailability of services.
So it is necessary to have a mechanism for monitoring PVs and react when PVs have problems.

## Proposal

We can separate the proposal into two parts:

* monitoring PVs and marking them if they have problems
* reacting to the unhealthy PVs

For monitoring, we may create a controller for it, and each volume plugin should have its own function to check volume health.
Controller can call them periodically. The controller also needs to watch node events because local PVs will be unreachable if nodes break down.

For reacting, different kinds of apps may have different methods,we can also create a controller for it.

At first phase, we may just react to statefulsets with local storage PVs.

## User Experience
### Use Cases

* Users create many local PVs manually and want to monitor their health condition;
* Users want to know if the volumes their apps are referencing are healthy or not;
* If node breaks down, we need to reschedule pods referencing that node's local PVs (the application has data backup and can restore it or can tolerate data loss);

## Implementation

As mentioned above, we can split this into two parts and put them in the external repo at first.

### Monitoring controller:

Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy.
Health checking implementation should be per plugin. Each volume plugin needs to have its own methods to check its volumes.

#### For in-tree volume plugins(except local storage):

We can add a new volume plugin interface: StatusCheckingVolumePlugin.
```
type StatusCheckingVolumePlugin interface {
VolumePlugin
CheckStatus(spec *Spec) (string, error)
}
```
And each volume plugin will implement it. The entire monitoring controller workflow is:

* Fill PV cache with initial data from etcd
* Resync and check volumes status periodically
* Taint PV if the volume status is abnormal
* Remove the taints if the volume becomes normal again

#### For out-tree volume plugins(except local storage):

We can create a new controller called MonitorController like ProvisionController,
which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions.
And each volume plugin will create its own monitor to check its volumes’ status.

#### For local storage:

The local storage PV monitor consists of two parts

* create a daemonset on every node, which is responsible for monitoring local PVs in that specific node, no matter the PVs are created manually or by provisioner;
* create a monitor controller, which is responsible for watching PVs and Nodes events. PVs may be updated if they are unhealthy and we also need to react to node failure event.

At the first phase, we can support local storage monitoring first.

Take local storage as an example, detailed monitoring implementation may be like this:

```
func (mon *minitor) CheckStatus(spec *v1.PersistentVolumeSpec) (string, error) {
// check if PV is local storage
if pv.Spec.Local == nil {
glog.Infof("PV: %s is not local storage", pv.Name)
return ...
}
// check if PV belongs to this node
if checkNodeAffinity(...) {
glog.Infof("PV: %s does not belong to this node", pv.Name)
return ...
}
// check volume health condition depending on device type
...
}
```
If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp.
The reaction controller then can react to this PV depending on the annotations and timestamp.

### Reaction controller:

Reaction controller will react to the PV update event (tainting PV by monitoring controller).
Different kinds of apps should have different reactions.

At the first phase, we just consider the statefulset reaction.

statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, we will do nothing,
otherwise we need to delete the PVC bound to the unhealthy volume(PV) as well as pods referencing it.

Reaction controller’s workflow is:

* Fill PV cache from etcd;
* Watch for PV update events;
* Resync and populate periodically;
* Delete related PVC and pods if needed (just for statefulsets and reclaim PV depending on reclaim policy);

### PV controller changes:
For unbound PVCs/PVs, PVCs will not be bound to PVs which have taints.

## Roadmap to support PV monitoring

* support local storage PV monitoring(mark PVs), and just react to statefulsets
* support other out-tree volume plugins and add PV taint support
* support in-tree volume plugins and react to other kinds of applications if needed

## Alternatives considered

0 comments on commit 9393e1a

Please sign in to comment.