Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring Kubernetes PVs #62

Closed
2phost opened this issue Nov 10, 2020 · 10 comments · Fixed by #113
Closed

Monitoring Kubernetes PVs #62

2phost opened this issue Nov 10, 2020 · 10 comments · Fixed by #113

Comments

@2phost
Copy link

2phost commented Nov 10, 2020

Feature Request

I expected to be able to see PV stats, like diskspace free etc. I am already scraping kubelet metrics, however I didn't find any metrics named kubelet_volume_stats_* .

From my undestanding CSI driver is responsible for providing metrics. Specifically, the endpoint NodeGetVolumeStats is supposed to provide metrics around disk and inode capacity.

Searched around, and it looks like other providers were able to implement this, such as:
digitalocean/csi-digitalocean#134
digitalocean/csi-digitalocean#197

Some references that may help:
container-storage-interface/spec#256
kubernetes-sigs/aws-ebs-csi-driver#223
kubernetes-sigs/aws-ebs-csi-driver#589

@2phost
Copy link
Author

2phost commented Nov 11, 2020

I just dug a little further and I found that you already have the placeholder of the implementation:

func (ns *LinodeNodeServer) NodeGetVolumeStats(ctx context.Context, req *csi.NodeGetVolumeStatsRequest) (*csi.NodeGetVolumeStatsResponse, error) {

@LuminousPath
Copy link

+1 on this, would be very useful for Database PV's

@jb3
Copy link

jb3 commented Sep 12, 2021

Another +1 on this, we had a couple of incidents just this week where Prometheus filled it's disk space and due to the lack of metrics we weren't alerted until Prometheus started receiving disk filled errors. Being able to pull volume metrics into Prometheus and similar tools would be hugely powerful.

@igorbrigadir
Copy link

It's been a while, just wondering if this is on the radar for future development too?

@lotus-x
Copy link

lotus-x commented May 13, 2022

Any progress on this?

@beegmon
Copy link

beegmon commented May 18, 2022

+1 here as we really need a clean way to monitor volumes usage and free space

@Tilusch
Copy link

Tilusch commented May 25, 2022

This feature is Important. Not being able to monitor volume usage is devastating for some applications.

@ajguyot
Copy link

ajguyot commented Feb 18, 2023

+1 here.

Does anybody know of a workaround to get PV metrics in the meantime, since the Linode team has been ignoring this issue for years now (unclear if the acquisition will help or hurt this...)? Is it just not possible unless legitimate support is added?

@applejag
Copy link

@ajguyot Yes one workaround I've been using is to do my own scraping via a script that basically runs df every now and then, and then exports this via nginx for prometheus to scrape:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: linode-volume-exporter
  namespace: kube-system
  labels:
    k8s-app: linode-volume-exporter
spec:
  selector:
    matchLabels:
      name: linode-volume-exporter
  template:
    metadata:
      labels:
        name: linode-volume-exporter
      annotations:
        prometheus.io/path: /metrics.txt
        prometheus.io/port: "80"
        prometheus.io/scrape: "true"
    spec:
      containers:
      - name: scraper
        image: alpine
        command:
          - sh
          - -c
          - |
            set -eu
            : ${OUT_FILE:=/mnt/metrics/metrics.txt}
            : ${NODE_NAME?}
            : ${INTERVAL:=10s}

            trap exit SIGTERM
            
            collect_metrics(){
              df -P | grep 'kubernetes\.io\~csi/[^/]*/mount' | while IFS= read -r line
              do
                DISK_TOTAL="$(echo $line | cut -d' ' -f2)"
                DISK_USED="$(echo $line | cut -d' ' -f3)"
                DISK_AVAIL="$(echo $line | cut -d' ' -f4)"
                MOUNT_PATH="$(echo $line | cut -d' ' -f6)"
                PV_NAME="$(basename "$(dirname "$MOUNT_PATH")")"
                echo "linode_pv_usage_bytes{persistentvolume=\"${PV_NAME}\",node=\"$NODE_NAME\"} $DISK_USED"
                echo "linode_pv_available_bytes{persistentvolume=\"${PV_NAME}\",node=\"$NODE_NAME\"} $DISK_AVAIL"
                echo "linode_pv_total_bytes{persistentvolume=\"${PV_NAME}\",node=\"$NODE_NAME\"} $DISK_TOTAL"
              done
            }
            
            echo "INTERVAL=$INTERVAL"
            echo "OUT_FILE=$OUT_FILE"
            while true
            do
              # cache results in memory
              METRICS="$(collect_metrics)"
              HEADER="# Prometheus metrics for Linode Volumes, generated at $(date)"
              echo "$HEADER" > "$OUT_FILE"
              echo "$METRICS" >> "$OUT_FILE"
              sleep "$INTERVAL"
            done

        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          # We can disable container root access,
          # but the container must run as root to
          # be able to read the volume statuses.
          # Don't need privileged container, though.
          readOnlyRootFilesystem: true
        volumeMounts:
          - mountPath: /mnt/host-pods
            name: host-pods
            readOnly: true
          - mountPath: /mnt/metrics
            name: metrics

      - name: exporter
        image: nginx:alpine
        ports:
          - containerPort: 80
            name: scrape
        volumeMounts:
          - mountPath: /usr/share/nginx/html
            name: metrics
            readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      volumes:
        - name: host-pods
          hostPath:
            path: /var/lib/kubelet/pods
        - name: metrics
          emptyDir: {}

It requires having a hostPath volume of all the node's pods mounted into the container, which then df finds automatically. This works great on LKE clusters. Have not tested on bare-Linode machines with manually installed Kubernetes.


This is based on some code I wrote for the OpenEBS monitor-pv, which is just a glorified script that runs du to calculate local-pv disk usage: openebs-archive/monitor-pv#13

But for Linode's volumes, it's much faster as we can make use of df instead.

However, would be much nicer if the CSI driver just exported its own metrics.

@2phost
Copy link
Author

2phost commented Feb 18, 2023

+1 here.

Does anybody know of a workaround to get PV metrics in the meantime, since the Linode team has been ignoring this issue for years now (unclear if the acquisition will help or hurt this...)? Is it just not possible unless legitimate support is added?

Or other option is to use Prometheus.
Just deploy this exporter https://github.com/kais271/pvc-exporter and you will have all pvc metrics in your Prometheus instance. This is how I do it right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants