Skip to content

snapraid-collector for prometheus metrics for snapraid smart, sync, and scrub commands

License

Notifications You must be signed in to change notification settings

ljmerza/snapraid-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SnapRAID Metrics Collector

This script collects various metrics from SnapRAID operations like sync, scrub, and smart and outputs them in a format compatible with Prometheus's textfile collector.

Grafana Dashboard

You can find this dashboard here

Prerequisites

  • SnapRAID installed and configured
  • Node Exporter with textfile collector enabled

Usage

To run the script, use the following command:

sudo ./snapraid_metrics_collector.sh [smart|scrub|sync]

You can specify one or more arguments to execute specific operations. For example:

sudo ./snapraid_metrics_collector.sh smart # to run the smart operation.
sudo ./snapraid_metrics_collector.sh scrub # to run the scrub operation.
sudo ./snapraid_metrics_collector.sh sync # to run the sync operation.
sudo ./snapraid_metrics_collector.sh smart sync # to run both smart and sync operations.

Integration with Prometheus Node Exporter

Place the script in a directory, e.g., /usr/local/bin.

Make it executable: chmod +x /usr/local/bin/snapraid_metrics_collector.sh.

Configure a cron job to run the script periodically and output to a textfile collector directory:

# Run snapraid sync every day at 1 AM
0 1 * * * /usr/local/bin/snapraid_metrics_collector.sh sync > /var/lib/node_exporter/textfile_collector/snapraid_sync.prom
# Run snapraid scrub once a week on Sunday at 3 AM
0 3 * * Sun /usr/local/bin/snapraid_metrics_collector.sh scrub > /var/lib/node_exporter/textfile_collector/snapraid_scrub.prom
# Run snapraid smart every day at 5 AM
0 5 * * * /usr/local/bin/snapraid_metrics_collector.sh smart > /var/lib/node_exporter/textfile_collector/snapraid_smart.prom

Adjust the cron schedule according to your requirements.

Configure Node Exporter to read metrics from this directory. This is usually done by passing the --collector.textfile.directory flag to Node Exporter with the path to the directory. Modify the Node Exporter service file accordingly.

For example, if you are using a systemd service to manage Node Exporter, edit the service file (typically located at /etc/systemd/system/node_exporter.service or /lib/systemd/system/node_exporter.service) and add the flag to the ExecStart line:

ExecStart=/usr/local/bin/node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

After modifying the service file, reload the systemd configuration and restart the Node Exporter service:

sudo systemctl daemon-reload
sudo systemctl restart node_exporter

Metrics

The script generates the following metrics:

Metric Name Description
snapraid_smart_exit_status Exit status of the last SnapRAID smart run.
snapraid_smart_last_ran Timestamp of the last SnapRAID smart run.
snapraid_smart_disk_temperature Disk temperature in degrees Celsius.
snapraid_smart_disk_power_on_days Number of days the disk has been powered on.
snapraid_smart_disk_error_count Number of errors reported by the disk.
snapraid_smart_disk_fail_probability Fail probability for individual disks within the next year based on SMART values calculated by SnapRAID.
snapraid_smart_total_fail_probability Fail probability for any disk failing within the next year based on SMART values calculated by SnapRAID.
- -
snapraid_scrub_exit_status Exit status of the last SnapRAID scrub run.
snapraid_scrub_last_run Timestamp of the last SnapRAID scrub run.
snapraid_scrub_scan_time_seconds Scan time for each item during SnapRAID scrub operation, in seconds.
snapraid_scrub_file_errors Number of file errors found during SnapRAID scrub.
snapraid_scrub_io_errors Number of I/O errors found during SnapRAID scrub.
snapraid_scrub_data_errors Number of data errors found during SnapRAID scrub.
snapraid_scrub_completion_percent Completion percentage of the SnapRAID scrub operation.
snapraid_scrub_accessed_mb Amount of data accessed during the SnapRAID scrub operation, in MB.
- -
snapraid_sync_exit_status Exit status of the last SnapRAID sync run.
snapraid_sync_last_run Timestamp of the last SnapRAID sync run.
snapraid_sync_scan_time_seconds Scan time for each item during SnapRAID sync operation, in seconds.
snapraid_sync_file_errors Number of file errors found during SnapRAID sync.
snapraid_sync_io_errors Number of I/O errors found during SnapRAID sync.
snapraid_sync_data_errors Number of data errors found during SnapRAID sync.
snapraid_sync_completion_percent Completion percentage of the SnapRAID sync operation.
snapraid_sync_accessed_mb Amount of data accessed during the SnapRAID sync operation, in MB.

Alerts

- name: Disk Alerts
  rules:
    - alert: Snapraid Disk Failure Probability
      expr: snapraid_sync_disk_fail_probability > 15
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: Snapraid Disk Failure on {{ $labels.instance }} - {{ $labels.job }}
        description: "Snapraid Disk Failure (current value: {{ $value }})"

    - alert: Snapraid Total Failure Probability
      expr: snapraid_sync_total_fail_probability > 40
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: Snapraid Total Failure on {{ $labels.instance }} - {{ $labels.job }}
        description: "Snapraid Total Failure (current value: {{ $value }})"

Logging

The script logs each SnapRAID command to a serperate file in the same directory a the script in smart.log, scrub.log, and sync.log files.

About

snapraid-collector for prometheus metrics for snapraid smart, sync, and scrub commands

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages