Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

[WIP]: Monitor local pv #528

Closed

Conversation

NickrenREN
Copy link
Contributor

@NickrenREN NickrenREN commented Dec 26, 2017

First commit: add monitor for local storage PVs.
The specific checking methods meed to be discussed more.
This one checks mount point, host dir and capacity.

xref : pv health monitoring proposal

TODOs:

  • ensure that PV usage is not greater than PV capacity ?
  • provide different monitor methods for different devices ?
  • node failure checking
  • add reaction

cc @ddysher @msau42

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 26, 2017
@NickrenREN
Copy link
Contributor Author

I compiled and tested the PR:

step1: make a new dir vol1 in /mnt/disks/vol and mount tmpfs , the provisioner creates a new local PV

I1226 11:13:19.779291       1 common.go:318] Creating client using in-cluster config
I1226 11:13:19.856974       1 main.go:69] Starting controller
I1226 11:13:19.857035       1 controller.go:44] Initializing volume cache
I1226 11:13:19.858787       1 controller.go:65] Monitor started
I1226 11:13:19.887827       1 monitor.go:132] no pv in ETCD at first
I1226 11:13:19.890266       1 populator.go:85] Starting Informer controller
I1226 11:13:19.890358       1 populator.go:89] Waiting for Informer initial sync
I1226 11:13:19.891182       1 monitor.go:176] Starting monitor controller local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304!
I1226 11:13:20.892025       1 controller.go:82] Controller started
I1226 11:13:20.892572       1 discovery.go:206] Found new volume of volumeType "file" at host path "/mnt/disks/vol/vol1" with capacity 4186193920, creating Local PV "local-pv-4e0b94ef"
I1226 11:13:20.933864       1 cache.go:55] Added pv "local-pv-4e0b94ef" to cache
I1226 11:13:20.934108       1 discovery.go:225] Created PV "local-pv-4e0b94ef" for volume at "/mnt/disks/vol/vol1"
I1226 11:13:20.956324       1 cache.go:64] Updated pv "local-pv-4e0b94ef" to cache

step2: unmout the /mnt/disks/vol/vol1/

root@nickren-14:/mnt/disks/vol# umount /mnt/disks/vol/vol1/

step3: describe the PV and find it is marked

nickren@nickren-14:~/GoProjects/src/github.com/kubernetes-incubator/external-storage/local-volume/provisioner/deployment/kubernetes/monitor$ kubectl describe pv local-pv-4e0b94ef
Name:            local-pv-4e0b94ef
Labels:          <none>
Annotations:     FirstMarkTime=2017-12-26 11:14:19.891625612 +0000 UTC m=+60.216449255
                 NotMountPoint=yes
                 pv.kubernetes.io/provisioned-by=local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304
                 volume.alpha.kubernetes.io/node-affinity={"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["192.168...
StorageClass:    local-storage
Status:          Available
Claim:           
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        3992Mi
Message:         
Source:
    Type:  LocalVolume (a persistent volume backed by local storage on a node)
    Path:  /mnt/disks/vol/vol1
Events:
  Type    Reason           Age   From                                                                         Message
  ----    ------           ----  ----                                                                         -------
  Normal  MarkPVSucceeded  1m    local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304  Mark PV successfully with annotation key: NotMountPoint

step4: delete the dir vol1 and find the PV is marked again

root@nickren-14:/mnt/disks/vol# rm -rf vol1/
root@nickren-14:/mnt/disks/vol# 
nickren@nickren-14:~/GoProjects/src/github.com/kubernetes-incubator/external-storage/local-volume/provisioner/deployment/kubernetes/monitor$ kubectl describe pv local-pv-4e0b94ef
Name:            local-pv-4e0b94ef
Labels:          <none>
Annotations:     FirstMarkTime=2017-12-26 11:14:19.891625612 +0000 UTC m=+60.216449255
                 HostPathNotExist=yes
                 NotMountPoint=yes
                 pv.kubernetes.io/provisioned-by=local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304
                 volume.alpha.kubernetes.io/node-affinity={"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["192.168...
StorageClass:    local-storage
Status:          Available
Claim:           
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        3992Mi
Message:         
Source:
    Type:  LocalVolume (a persistent volume backed by local storage on a node)
    Path:  /mnt/disks/vol/vol1
Events:
  Type    Reason           Age   From                                                                         Message
  ----    ------           ----  ----                                                                         -------
  Normal  MarkPVSucceeded  2m    local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304  Mark PV successfully with annotation key: NotMountPoint
  Normal  MarkPVSucceeded  33s   local-volume-provisioner-192.168.1.100-bd492623-ea22-11e7-826b-080027765304  Mark PV successfully with annotation key: HostPathNotExist

NickrenREN added 2 commits December 26, 2017 21:15
@msau42
Copy link
Contributor

msau42 commented Feb 8, 2018

/assign

}

// checkStatus checks pv health condition
func (monitor *Monitor) checkStatus(pv *v1.PersistentVolume) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use fsck or smartctl to check the disk healthy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is just a simple demo, lots of extra work needs to be done for local pv monitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i am updating the proposal now, will send it out when ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickrenREN Thanks. Looking forward to the Proposal.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 23, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 23, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/local-volume cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants