Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV monitoring proposal #1484

Closed
wants to merge 1 commit into from

Conversation

NickrenREN
Copy link
Contributor

Add a proposal for PV monitoring.
We may focus on local storage PV monitoring at the first phase.

/assign @msau42
/cc @ddysher @jingxu97
/sig storage

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 11, 2017
@k8s-ci-robot
Copy link
Contributor

@NickrenREN: GitHub didn't allow me to request PR reviews from the following users: ddysher.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

Add a proposal for PV monitoring.
We may focus on local storage PV monitoring at the first phase.

/assign @msau42
/cc @ddysher @jingxu97
/sig storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 11, 2017
@NickrenREN NickrenREN force-pushed the pv-monitor-proposal branch 2 times, most recently from 4b7a149 to 484ad79 Compare December 11, 2017 04:59
Take local storage as an example, implementation may be like this:

```
func (mon *minitor) CheckStatus(spec *v1.PersistentVolumeSpec) (string, error) {
Copy link
Contributor Author

@NickrenREN NickrenREN Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs further discussion

@redbaron
Copy link

It is not clear how apps can react to bad news and recover from failure. I'd be very hesitant to allow Statefulset to delete PVC should there be failure detected:

  • there is no guarantee that new PV created will be healthy
  • removing PVC removes data, so if PV has a chance to become healthy later on I'd much rather wait
  • usually PV failure is tied with storage failure, that means that all PVs in that failure domain are going to report bad health state and therefore deleted. This is catastrophic


### Monitoring controller:

Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when PVs are tainted? Currently we do not have ability to taint PVs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at the first phase, do not need to change PV struct, we will mark PV by adding annotations instead.


We can separate the proposal into two parts:

* monitoring PVs and marking them if they have problems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly will it monitor btw? Are you talking about bad blocks or fsck? Please add some examples.

Copy link
Contributor Author

@NickrenREN NickrenREN Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will monitor if volumes still exist or if their status are ok.
e.g.
For local PV: If its Path( directory) is deleted by mistake or node breaks down... this will cause data loss, kubernetes needs to know that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explicitly write down some specific error conditions somewhere.

@msau42
Copy link
Member

msau42 commented Dec 20, 2017

@redbaron agree. That is why we should clearly document here exactly what use cases we are trying to solve. And also this reaction part needs to be opt-in by the application, so that only those that can recover from these scenarios can benefit.

From a workloads point of view, we want to target distributed applications like Cassandra, that replica their data across multiple instances for redundancy.

From a platform point of view, the infrastructure failures I'm thinking of are:

  • local disks in cloud providers, where if the VM is deleted, then your local disk is really gone forever.
  • local disks in on-prem, and the disk hardware has failed (open issue on how to actually detect that). If you don't have them in a RAID config, then the data is also gone.

@NickrenREN NickrenREN force-pushed the pv-monitor-proposal branch 2 times, most recently from a3b2291 to 908ad48 Compare December 20, 2017 02:51
@NickrenREN
Copy link
Contributor Author

NickrenREN commented Dec 20, 2017

@redbaron sorry for the delay. yeah, if we want to delete the PVC and pods directly and reschedule them to a new node, the application needs to have data backup and can restore it or can tolerate data loss

@msau42
Copy link
Member

msau42 commented Jan 30, 2018

cc @vkamra

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018
return ...
}

// check volume health condition depending on device type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you outline here what kind of checks you want to do? What are some general checks, and what are some specific checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general checks: volume accessing check, volume existing check ...
specific checks: mountpoint check, local path check,...
will update


At the first phase, we just consider the statefulset reaction.

statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, we will do nothing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this controller know which StatefulSets to monitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by monitoring relatedPVs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we mark which PVs we should monitor? For example, not all StatefulSets may want this behavior. So we need a way to be able to say which StatefulSets/PVs should be handled by the reaction controller.

BTW, @jingxu97 just showed me some cool work related to a Snapshot policy controller using the metacontroller framework. A CRD is defined that specifies:

  • How often a snapshot should be taken
  • Label selector to choose which PVCs should follow this policy.

I think we could do something similar for this controller, and have a CRD to define a policy with:

  • What conditions to react to
  • Time thresholds before reacting
  • What should the action be for each condition
  • Label selector for which StatefulSets to monitor


}
```
If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the annotation key/value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sure, defined in demo PR: kubernetes-retired/external-storage#528
will update this proposal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the annotation format and syntax in this design doc, and also put some of the examples here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions.
And each volume plugin will create its own monitor to check its volumes’ status.

#### For local storage:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be very interested in seeing an outline for monitoring node failures, which is a failure mode that local volumes are especially prone to compared to other volume types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, will do


We can separate the proposal into two parts:

* monitoring PVs and marking them if they have problems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explicitly write down some specific error conditions somewhere.

## User Experience
### Use Cases

* Users create many local PVs manually and want to monitor their health condition;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all of the use cases here are for local PVs, should we just have this document be specific to local PVs for now? We can consider extending this to other PV types in the future if the need arises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we want to focus on local storage at the first stage, and then move to other storage drivers if needed.

* Fill PV cache from etcd;
* Watch for PV update events;
* Resync and populate periodically;
* Delete related PVC and pods if needed (just for statefulsets and reclaim PV depending on reclaim policy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what application are you planning to use with this? It may be good to mention in the beginning use cases what kinds of applications this is suitable for.

Copy link
Contributor Author

@NickrenREN NickrenREN Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to delete the PV directly, the application should have data backup and can restore it or can tolerate data loss.

@NickrenREN
Copy link
Contributor Author

@ddysher @msau42 update the use cases here, and will modify the rest accordingly.
simple example is here: kubernetes-retired/external-storage#528


At the first phase, we just consider the statefulset reaction.

statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval, we will do nothing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we mark which PVs we should monitor? For example, not all StatefulSets may want this behavior. So we need a way to be able to say which StatefulSets/PVs should be handled by the reaction controller.

BTW, @jingxu97 just showed me some cool work related to a Snapshot policy controller using the metacontroller framework. A CRD is defined that specifies:

  • How often a snapshot should be taken
  • Label selector to choose which PVCs should follow this policy.

I think we could do something similar for this controller, and have a CRD to define a policy with:

  • What conditions to react to
  • Time thresholds before reacting
  • What should the action be for each condition
  • Label selector for which StatefulSets to monitor


}
```
If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the annotation format and syntax in this design doc, and also put some of the examples here too?

## User Experience
### Use Cases

* If the local PV path is deleted, users should know that and the local PV should be marked and deleted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's two different reaction controllers being proposed here?

  • One for StatefulSets that are using these PVs
  • One for local PVs

Could you try to clarify this in the doc and separate out which behaviors will be handled by which controllers? I think one of the confusing things here is there's multiple levels/layers of reaction and it's not clear how they will all interact with each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this design should just focus on the local PV monitoring part. And we can leave StatefulSet handling to a different design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, reaction needs more discussion, so i will move it to next stage.

@NickrenREN NickrenREN force-pushed the pv-monitor-proposal branch from d082a85 to 9c6eb40 Compare April 23, 2018 04:02
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: childsb

Assign the PR to them by writing /assign @childsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@NickrenREN NickrenREN force-pushed the pv-monitor-proposal branch from 9c6eb40 to 8e0a47a Compare April 23, 2018 04:14
@ddysher
Copy link
Contributor

ddysher commented May 13, 2018

@NickrenREN @msau42 what's the status of the proposal :)

@NickrenREN
Copy link
Contributor Author

NickrenREN commented May 14, 2018

@ddysher I am prototyping PV monitor here in https://github.com/caicloud/kube-storage-monitor
And will update this proposal soon. I am planning to add local volume PV monitor support in Q2 or Q3,
WDYT, @msau42 ?

@msau42
Copy link
Member

msau42 commented May 14, 2018

I probably won't have time to thoroughly review this until Q3 timeframe.

But it would help if in this design proposal, you could summarize things into a table with the following columns:

  • Failure type
  • Reaction (and by who)

Also it would be good to think about some of these questions:

  • Can reaction be configurable per PV/PVC? Reaction should only be opt-in because it could have dangerous consequences for various workloads.
  • What happens if the failure is only temporary? Can we have configurable timeouts before reacting?

I really like the idea of having CRDs where you can define your reaction policies, and then using something like the metacontroller to write a reaction controller. I would sync up with @jingxu97 on the work she did for snapshot policies.

@msau42
Copy link
Member

msau42 commented Jun 18, 2018

I opened up kubernetes-retired/external-storage#817 to discuss ideas for handling the node deletion scenario.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2018
@NickrenREN
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2019
@sarjeet2013
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 21, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2019
@NickrenREN
Copy link
Contributor Author

Closing, in favor of kubernetes/enhancements#1077

@NickrenREN NickrenREN closed this Jun 3, 2019
@NickrenREN NickrenREN deleted the pv-monitor-proposal branch June 3, 2019 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants