Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTRFS stats missing #2632

Closed
DaAwesomeP opened this issue Mar 10, 2023 · 15 comments · Fixed by #2634
Closed

BTRFS stats missing #2632

DaAwesomeP opened this issue Mar 10, 2023 · 15 comments · Fixed by #2634

Comments

@DaAwesomeP
Copy link
Contributor

DaAwesomeP commented Mar 10, 2023

Host operating system: output of uname -a

5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.5.0 (branch: HEAD, revision: 1b48970ffcf5630534fb00bb0687d73c66d1c959)
  build user:       root@6e7732a7b81b
  build date:       20221129-18:59:09
  go version:       go1.19.3
  platform:         linux/amd64

node_exporter command line flags

--web.systemd-socket --collector.systemd --collector.textfile.directory=/usr/local/var/lib/prometheus-node-exporter/textfile

node_exporter log output

ts=2023-03-10T15:25:20.138Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.5.0, branch=HEAD, revision=1b48970ffcf5630534fb00bb0687d73c66>
ts=2023-03-10T15:25:20.138Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.19.3, user=root@6e7732a7b81b, date=20221129-18:59:09)"
ts=2023-03-10T15:25:20.138Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(ram|loop|fd|(h|s|v|xv)d[a>
ts=2023-03-10T15:25:20.138Z caller=systemd_linux.go:152 level=info collector=systemd msg="Parsed flag --collector.systemd.unit-include" flag=.+
ts=2023-03-10T15:25:20.138Z caller=systemd_linux.go:154 level=info collector=systemd msg="Parsed flag --collector.systemd.unit-exclude" flag=.+\.(automount|device|mount|scope|sl>
ts=2023-03-10T15:25:20.139Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/cr>
ts=2023-03-10T15:25:20.139Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|b>
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:110 level=info msg="Enabled collectors"
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=arp
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=bcache
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=bonding
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=btrfs
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=conntrack
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=cpu
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=cpufreq
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=diskstats
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=dmi
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=edac
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=entropy
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=fibrechannel
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=filefd
ts=2023-03-10T15:25:20.138Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.5.0, branch=HEAD, revision=1b48970ffcf5630534fb00bb0687d73c66d1c959)"
ts=2023-03-10T15:25:20.138Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.19.3, user=root@6e7732a7b81b, date=20221129-18:59:09)"
ts=2023-03-10T15:25:20.138Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2023-03-10T15:25:20.138Z caller=systemd_linux.go:152 level=info collector=systemd msg="Parsed flag --collector.systemd.unit-include" flag=.+
ts=2023-03-10T15:25:20.138Z caller=systemd_linux.go:154 level=info collector=systemd msg="Parsed flag --collector.systemd.unit-exclude" flag=.+\.(automount|device|mount|scope|slice)
ts=2023-03-10T15:25:20.139Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2023-03-10T15:25:20.139Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:110 level=info msg="Enabled collectors"
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=arp
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=bcache
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=bonding
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=btrfs
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=conntrack
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=cpu
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=cpufreq
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=diskstats
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=dmi
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=edac
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=entropy
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=fibrechannel
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=filefd
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=filesystem
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=hwmon
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=infiniband
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=ipvs
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=loadavg
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=mdadm
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=meminfo
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=netclass
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=netdev
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=netstat
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=nfs
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=nfsd
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=nvme
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=os
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=powersupplyclass
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=pressure
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=rapl
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=schedstat
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=selinux
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=sockstat
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=softnet
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=stat
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=systemd
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=tapestats
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=textfile
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=thermal_zone
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=time
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=timex
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=udp_queues
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=uname
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=vmstat
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=xfs
ts=2023-03-10T15:25:20.139Z caller=node_exporter.go:117 level=info collector=zfs
ts=2023-03-10T15:25:20.139Z caller=tls_config.go:207 level=info msg="Listening on systemd activated listeners instead of port listeners."
ts=2023-03-10T15:25:20.139Z caller=tls_config.go:232 level=info msg="Listening on" address=/run/prometheus-node-exporter/prometheus-node-exporter.sock
ts=2023-03-10T15:25:20.139Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=/run/prometheus-node-exporter/prometheus-node-exporter.sock

Are you running node_exporter in Docker?

No

What did you do that produced an error?

No error but BTRFS stats are missing.

What did you expect to see?

Missing stats from #2193 like:

node_btrfs_device_size_bytes
node_btrfs_device_unused_bytes
node_btrfs_device_errors_total{type="corruption"}
node_btrfs_device_errors_total{type="flush"}
node_btrfs_device_errors_total{type="generation"}
node_btrfs_device_errors_total{type="read"}
node_btrfs_device_errors_total{type="write"}

What did you see instead?

The following BTRFS metrics appear:

# HELP node_btrfs_allocation_ratio Data allocation ratio for a layout/data type
# TYPE node_btrfs_allocation_ratio gauge
node_btrfs_allocation_ratio{block_group_type="data",mode="raid1",uuid="REDACTED"} 2
node_btrfs_allocation_ratio{block_group_type="metadata",mode="raid1",uuid="REDACTED"} 2
node_btrfs_allocation_ratio{block_group_type="system",mode="raid1",uuid="REDACTED"} 2
# HELP node_btrfs_device_size_bytes Size of a device that is part of the filesystem.
# TYPE node_btrfs_device_size_bytes gauge
node_btrfs_device_size_bytes{device="nvme0n1p2",uuid="REDACTED"} 1.26472945664e+11
node_btrfs_device_size_bytes{device="nvme1n1p2",uuid="REDACTED"} 1.26472945664e+11
# HELP node_btrfs_global_rsv_size_bytes Size of global reserve.
# TYPE node_btrfs_global_rsv_size_bytes gauge
node_btrfs_global_rsv_size_bytes{uuid="REDACTED"} 7.6660736e+07
# HELP node_btrfs_info Filesystem information
# TYPE node_btrfs_info gauge
node_btrfs_info{label="",uuid="REDACTED"} 1
# HELP node_btrfs_reserved_bytes Amount of space reserved for a data type
# TYPE node_btrfs_reserved_bytes gauge
node_btrfs_reserved_bytes{block_group_type="data",uuid="REDACTED"} 0
node_btrfs_reserved_bytes{block_group_type="metadata",uuid="REDACTED"} 147456
node_btrfs_reserved_bytes{block_group_type="system",uuid="REDACTED"} 0
# HELP node_btrfs_size_bytes Amount of space allocated for a layout/data type
# TYPE node_btrfs_size_bytes gauge
node_btrfs_size_bytes{block_group_type="data",mode="raid1",uuid="REDACTED"} 1.24290859008e+11
node_btrfs_size_bytes{block_group_type="metadata",mode="raid1",uuid="REDACTED"} 2.147483648e+09
node_btrfs_size_bytes{block_group_type="system",mode="raid1",uuid="REDACTED"} 3.3554432e+07
# HELP node_btrfs_used_bytes Amount of used space by a layout/data type
# TYPE node_btrfs_used_bytes gauge
node_btrfs_used_bytes{block_group_type="data",mode="raid1",uuid="REDACTED"} 3.9303360512e+10
node_btrfs_used_bytes{block_group_type="metadata",mode="raid1",uuid="REDACTED"} 1.72376064e+08
node_btrfs_used_bytes{block_group_type="system",mode="raid1",uuid="REDACTED"} 49152
...
# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
node_scrape_collector_duration_seconds{collector="btrfs"} 0.000405861
...
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
node_scrape_collector_success{collector="btrfs"} 1

I also do see the BTRFS volume in node_filesystem metrics:

# HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 8.4987498496e+10
# HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device.
# TYPE node_filesystem_device_error gauge
node_filesystem_device_error{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 0
# HELP node_filesystem_files Filesystem total file nodes.
# TYPE node_filesystem_files gauge
node_filesystem_files{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 0
# HELP node_filesystem_files_free Filesystem total free file nodes.
# TYPE node_filesystem_files_free gauge
node_filesystem_files_free{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 0
# HELP node_filesystem_free_bytes Filesystem free space in bytes.
# TYPE node_filesystem_free_bytes gauge
node_filesystem_free_bytes{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 8.69204992e+10
# HELP node_filesystem_readonly Filesystem read-only status.
# TYPE node_filesystem_readonly gauge
node_filesystem_readonly{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 0
# HELP node_filesystem_size_bytes Filesystem size in bytes.
# TYPE node_filesystem_size_bytes gauge
node_filesystem_size_bytes{device="/dev/nvme0n1p2",fstype="btrfs",mountpoint="/"} 1.26472945664e+11

Output of some BTRFS stats (not through node_exporter):

# btrfs device stats /
[/dev/nvme0n1p2].write_io_errs    0
[/dev/nvme0n1p2].read_io_errs     0
[/dev/nvme0n1p2].flush_io_errs    0
[/dev/nvme0n1p2].corruption_errs  0
[/dev/nvme0n1p2].generation_errs  0
[/dev/nvme1n1p2].write_io_errs    0
[/dev/nvme1n1p2].read_io_errs     0
[/dev/nvme1n1p2].flush_io_errs    0
[/dev/nvme1n1p2].corruption_errs  1
[/dev/nvme1n1p2].generation_errs  0

# btrfs device stats /dev/nvme0n1p2
[/dev/nvme0n1p2].write_io_errs    0
[/dev/nvme0n1p2].read_io_errs     0
[/dev/nvme0n1p2].flush_io_errs    0
[/dev/nvme0n1p2].corruption_errs  0
[/dev/nvme0n1p2].generation_errs  0

# btrfs device stats /dev/nvme1n1p2 
[/dev/nvme1n1p2].write_io_errs    0
[/dev/nvme1n1p2].read_io_errs     0
[/dev/nvme1n1p2].flush_io_errs    0
[/dev/nvme1n1p2].corruption_errs  1
[/dev/nvme1n1p2].generation_errs  0

Note that /dev/nvme0n1p2 and /dev/nvme1n1p2 are a BTRFS RAID1 array

NOTE: Apologies for the initial confusing issue, accidentally hit enter too early!

@DaAwesomeP DaAwesomeP changed the title raid1 BTRFS stats missing Mar 10, 2023
@DaAwesomeP
Copy link
Contributor Author

Ah, just found this comment: #2173 (comment)

Is the issue that I am on a kernel earlier than 5.14? If this is the issue then I will open a pull to add this exception to the docs/README.

Output of the sysfs directory:

# ls -al /sys/fs/btrfs/REDACTED/devinfo/1/
total 0
drwxr-xr-x 2 root root    0 Mar 10 12:02 .
drwxr-xr-x 4 root root    0 Mar 10 12:02 ..
-r--r--r-- 1 root root 4096 Mar 10 12:02 in_fs_metadata
-r--r--r-- 1 root root 4096 Mar 10 12:02 missing
-r--r--r-- 1 root root 4096 Mar 10 12:02 replace_target
-r--r--r-- 1 root root 4096 Mar 10 12:02 writeable

cc @leth

@leth
Copy link
Contributor

leth commented Mar 10, 2023

The implementation gathers this data via syscalls rather than /sys/fs as there's not sufficient info in there to tell you which disk it is.

Can you try with debug logging enabled please! You should be able to grep it down to just things mentioning btrfs.

It'd also be helpful to know what user node_exporter is running as, the ownership & permissions on the mount points, and if there's anything else relevant in-play like containers etc!

@DaAwesomeP
Copy link
Contributor Author

Not running in a container at all, but it is running as a systemd service on Debian 11.

Here is the service file:

[Unit]
Description=Prometheus Node Exporter
Requires=prometheus-node-exporter.socket

[Service]
User=prom-node-exporter
Group=prom-node-exporter
EnvironmentFile=/etc/default/prometheus-node-exporter
ExecStart=/usr/local/bin/prometheus-node-exporter --web.systemd-socket $OPTIONS
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

It is running as a system user with its own group. Let me know if it needs to be in any other groups in order to access the stats:

# id prom-node-exporter 
uid=998(prom-node-exporter) gid=997(prom-node-exporter) groups=997(prom-node-exporter),992(prom-node-exptr-txt)

Relevant fstab line:

UUID=REDACTED /               btrfs   defaults,subvol=@rootfs,degraded 0       0

It's just the main root mountpoint /, so it is owned by root with permissions 755.

Probably the culprit line with debug logging:

ts=2023-03-10T21:36:59.403Z caller=btrfs_linux.go:132 level=debug collector=btrfs msg="Error inspecting btrfs mountpoint" mountPoint=/ err="open /: operation not permitted"
ts=2023-03-10T21:36:59.403Z caller=collector.go:173 level=debug msg="collector succeeded" name=btrfs duration_seconds=0.004333785

I think this maybe should show up as an error and not a debug log line? Or maybe it could error if you explcitly enable the BTRFS node_exporter module? It also still says collector succeeded.

Surely I don't need to run the exporter as root? I would rather not do that, although I could jail it in pretty well with systemd unit restrictions if I needed to.

Debug log (removed anything from the systemd collector):

ts=2023-03-10T21:36:51.786Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.5.0, branch=HEAD, revision=1b48970ffcf5630534fb00bb0687d73c66d1c959)"
ts=2023-03-10T21:36:51.786Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.19.3, user=root@6e7732a7b81b, date=20221129-18:59:09)"
ts=2023-03-10T21:36:51.786Z caller=node_exporter.go:186 level=debug msg="Go MAXPROCS" procs=1
ts=2023-03-10T21:36:51.786Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2023-03-10T21:36:51.787Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2023-03-10T21:36:51.787Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:110 level=info msg="Enabled collectors"
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=arp
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=bcache
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=bonding
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=btrfs
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=conntrack
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=cpu
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=cpufreq
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=diskstats
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=dmi
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=edac
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=entropy
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=fibrechannel
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=filefd
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=filesystem
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=hwmon
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=infiniband
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=ipvs
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=loadavg
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=mdadm
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=meminfo
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=netclass
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=netdev
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=netstat
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=nfs
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=nfsd
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=nvme
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=os
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=powersupplyclass
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=pressure
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=rapl
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=schedstat
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=selinux
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=sockstat
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=softnet
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=stat
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=tapestats
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=textfile
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=thermal_zone
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=time
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=timex
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=udp_queues
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=uname
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=vmstat
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=xfs
ts=2023-03-10T21:36:51.787Z caller=node_exporter.go:117 level=info collector=zfs
ts=2023-03-10T21:36:51.787Z caller=tls_config.go:207 level=info msg="Listening on systemd activated listeners instead of port listeners."
ts=2023-03-10T21:36:51.787Z caller=tls_config.go:232 level=info msg="Listening on" address=/run/prometheus-node-exporter/prometheus-node-exporter.sock
ts=2023-03-10T21:36:51.787Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=/run/prometheus-node-exporter/prometheus-node-exporter.sock
ts=2023-03-10T21:36:59.391Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"
ts=2023-03-10T21:36:59.393Z caller=ipvs_linux.go:146 level=debug collector=ipvs msg="ipvs collector metrics are not available for this system"
ts=2023-03-10T21:36:59.393Z caller=collector.go:167 level=debug msg="collector returned no data" name=ipvs duration_seconds=0.000100301 err="collector returned no data"
ts=2023-03-10T21:36:59.393Z caller=nfsd_linux.go:69 level=debug collector=nfsd msg="Not collecting NFSd metrics" err="open /proc/net/rpc/nfsd: no such file or directory"
ts=2023-03-10T21:36:59.393Z caller=collector.go:167 level=debug msg="collector returned no data" name=nfsd duration_seconds=5.9816e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.393Z caller=collector.go:173 level=debug msg="collector succeeded" name=timex duration_seconds=2.6266e-05
ts=2023-03-10T21:36:59.393Z caller=collector.go:173 level=debug msg="collector succeeded" name=filefd duration_seconds=5.3814e-05
ts=2023-03-10T21:36:59.393Z caller=nfs_linux.go:103 level=debug collector=nfs msg="Not collecting NFS metrics" err="open /proc/net/rpc/nfs: no such file or directory"
ts=2023-03-10T21:36:59.393Z caller=collector.go:167 level=debug msg="collector returned no data" name=nfs duration_seconds=2.9491e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.395Z caller=collector.go:173 level=debug msg="collector succeeded" name=thermal_zone duration_seconds=0.001614161
ts=2023-03-10T21:36:59.395Z caller=collector.go:173 level=debug msg="collector succeeded" name=dmi duration_seconds=3.5182e-05
ts=2023-03-10T21:36:59.395Z caller=mdadm_linux.go:115 level=debug collector=mdadm msg="Not collecting mdstat, file does not exist" file=/proc
ts=2023-03-10T21:36:59.395Z caller=collector.go:167 level=debug msg="collector returned no data" name=mdadm duration_seconds=5.597e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.395Z caller=collector.go:173 level=debug msg="collector succeeded" name=xfs duration_seconds=1.8894e-05
ts=2023-03-10T21:36:59.395Z caller=collector.go:173 level=debug msg="collector succeeded" name=uname duration_seconds=1.2484e-05
ts=2023-03-10T21:36:59.396Z caller=collector.go:173 level=debug msg="collector succeeded" name=bcache duration_seconds=2.7829e-05
ts=2023-03-10T21:36:59.396Z caller=collector.go:173 level=debug msg="collector succeeded" name=conntrack duration_seconds=0.000163611
ts=2023-03-10T21:36:59.397Z caller=collector.go:173 level=debug msg="collector succeeded" name=cpu duration_seconds=0.000882561
ts=2023-03-10T21:36:59.398Z caller=collector.go:173 level=debug msg="collector succeeded" name=netclass duration_seconds=0.000926949
ts=2023-03-10T21:36:59.398Z caller=collector.go:173 level=debug msg="collector succeeded" name=netstat duration_seconds=0.000416898
ts=2023-03-10T21:36:59.398Z caller=collector.go:173 level=debug msg="collector succeeded" name=textfile duration_seconds=9.6511e-05
ts=2023-03-10T21:36:59.398Z caller=collector.go:173 level=debug msg="collector succeeded" name=udp_queues duration_seconds=5.0828e-05
ts=2023-03-10T21:36:59.398Z caller=os_release.go:146 level=debug collector=os msg="file modification time has changed" file=//etc/os-release old_value=0001-01-01T00:00:00Z new_value=2022-12-09T14:15:00-05:00
ts=2023-03-10T21:36:59.398Z caller=collector.go:173 level=debug msg="collector succeeded" name=os duration_seconds=6.0654e-05
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=schedstat duration_seconds=7.2892e-05
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=powersupplyclass duration_seconds=2.3676e-05
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=arp duration_seconds=2.8102e-05
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=entropy duration_seconds=3.1447e-05
ts=2023-03-10T21:36:59.399Z caller=fibrechannel_linux.go:113 level=debug collector=fibrechannel msg="fibrechannel statistics not found, skipping"
ts=2023-03-10T21:36:59.399Z caller=collector.go:167 level=debug msg="collector returned no data" name=fibrechannel duration_seconds=1.0451e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.399Z caller=zfs_linux.go:53 level=debug collector=zfs msg="Cannot open file for reading" path=/proc/spl/kstat/zfs
ts=2023-03-10T21:36:59.399Z caller=zfs.go:73 level=debug collector=zfs err="ZFS / ZFS statistics are not available"
ts=2023-03-10T21:36:59.399Z caller=collector.go:167 level=debug msg="collector returned no data" name=zfs duration_seconds=1.5524e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/proc
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/dev
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/dev/pts
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/kernel/security
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/dev/shm
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/fs/cgroup
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/fs/pstore
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/firmware/efi/efivars
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/fs/bpf
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/proc/sys/fs/binfmt_misc
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/kernel/debug
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/kernel/tracing
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/dev/mqueue
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/dev/hugepages
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/kernel/config
ts=2023-03-10T21:36:59.399Z caller=filesystem_linux.go:55 level=debug collector=filesystem msg="Ignoring mount point" mountpoint=/sys/fs/fuse/connections
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=filesystem duration_seconds=0.000211585
ts=2023-03-10T21:36:59.399Z caller=tapestats_linux.go:129 level=debug collector=tapestats msg="scsi_tape stats not found, skipping"
ts=2023-03-10T21:36:59.399Z caller=collector.go:167 level=debug msg="collector returned no data" name=tapestats duration_seconds=9.915e-06 err="collector returned no data"
ts=2023-03-10T21:36:59.399Z caller=collector.go:173 level=debug msg="collector succeeded" name=softnet duration_seconds=3.4849e-05
ts=2023-03-10T21:36:59.399Z caller=infiniband_linux.go:119 level=debug collector=infiniband msg="infiniband statistics not found, skipping"
ts=2023-03-10T21:36:59.399Z caller=collector.go:167 level=debug msg="collector returned no data" name=infiniband duration_seconds=1.5324e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.400Z caller=rapl_linux.go:91 level=debug collector=rapl msg="Can't access energy_uj file" zone="unsupported value type" err="open /sys/class/powercap/intel-rapl:0/energy_uj: permission denied"
ts=2023-03-10T21:36:59.400Z caller=collector.go:167 level=debug msg="collector returned no data" name=rapl duration_seconds=0.000108265 err="collector returned no data"
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=vmstat duration_seconds=7.6406e-05
ts=2023-03-10T21:36:59.400Z caller=pressure_linux.go:92 level=debug collector=pressure msg="collecting statistics for resource" resource=cpu
ts=2023-03-10T21:36:59.400Z caller=pressure_linux.go:92 level=debug collector=pressure msg="collecting statistics for resource" resource=io
ts=2023-03-10T21:36:59.400Z caller=pressure_linux.go:92 level=debug collector=pressure msg="collecting statistics for resource" resource=memory
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=pressure duration_seconds=7.122e-05
ts=2023-03-10T21:36:59.400Z caller=time.go:73 level=debug collector=time msg="Return time" now=1.6784842194002724e+09
ts=2023-03-10T21:36:59.400Z caller=time.go:75 level=debug collector=time msg="Zone offset" offset=-18000 time_zone=EST
ts=2023-03-10T21:36:59.400Z caller=time_linux.go:38 level=debug collector=time msg="in Update" clocksources="[{0 [tsc acpi_pm] tsc}]"
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=time duration_seconds=6.2705e-05
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=diskstats duration_seconds=0.000122817
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=nvme duration_seconds=6.3675e-05
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=edac duration_seconds=1.0164e-05
ts=2023-03-10T21:36:59.400Z caller=bonding_linux.go:64 level=debug collector=bonding msg="Not collecting bonding, file does not exist" file=/sys/class/net
ts=2023-03-10T21:36:59.400Z caller=collector.go:167 level=debug msg="collector returned no data" name=bonding duration_seconds=1.1271e-05 err="collector returned no data"
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=sockstat duration_seconds=6.1716e-05
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=stat duration_seconds=0.0001305
ts=2023-03-10T21:36:59.400Z caller=collector.go:173 level=debug msg="collector succeeded" name=selinux duration_seconds=2.104e-05
ts=2023-03-10T21:36:59.401Z caller=meminfo.go:54 level=debug collector=meminfo msg="Set node_mem" memInfo="unsupported value type"
ts=2023-03-10T21:36:59.401Z caller=collector.go:173 level=debug msg="collector succeeded" name=meminfo duration_seconds=0.000116117
ts=2023-03-10T21:36:59.402Z caller=loadavg.go:55 level=debug collector=loadavg msg="return load" index=0 load=1.1
ts=2023-03-10T21:36:59.402Z caller=loadavg.go:55 level=debug collector=loadavg msg="return load" index=1 load=0.61
ts=2023-03-10T21:36:59.402Z caller=loadavg.go:55 level=debug collector=loadavg msg="return load" index=2 load=0.53
ts=2023-03-10T21:36:59.402Z caller=collector.go:173 level=debug msg="collector succeeded" name=loadavg duration_seconds=0.008863287
ts=2023-03-10T21:36:59.403Z caller=btrfs_linux.go:132 level=debug collector=btrfs msg="Error inspecting btrfs mountpoint" mountPoint=/ err="open /: operation not permitted"
ts=2023-03-10T21:36:59.403Z caller=collector.go:173 level=debug msg="collector succeeded" name=btrfs duration_seconds=0.004333785
ts=2023-03-10T21:36:59.403Z caller=collector.go:173 level=debug msg="collector succeeded" name=netdev duration_seconds=0.004639919
ts=2023-03-10T21:36:59.419Z caller=collector.go:173 level=debug msg="collector succeeded" name=cpufreq duration_seconds=0.023979739
ts=2023-03-10T21:36:59.423Z caller=collector.go:173 level=debug msg="collector succeeded" name=systemd duration_seconds=0.024333609
ts=2023-03-10T21:36:59.522Z caller=collector.go:173 level=debug msg="collector succeeded" name=hwmon duration_seconds=0.127332658

@DaAwesomeP
Copy link
Contributor Author

Hmm, at least running btrfs device stats seems to require root or a privileged capability (or some other group I am unaware of):

# btrfs device stats /
[/dev/nvme0n1p2].write_io_errs    0
[/dev/nvme0n1p2].read_io_errs     0
[/dev/nvme0n1p2].flush_io_errs    0
[/dev/nvme0n1p2].corruption_errs  0
[/dev/nvme0n1p2].generation_errs  0
[/dev/nvme1n1p2].write_io_errs    0
[/dev/nvme1n1p2].read_io_errs     0
[/dev/nvme1n1p2].flush_io_errs    0
[/dev/nvme1n1p2].corruption_errs  1
[/dev/nvme1n1p2].generation_errs  0

# btrfs device stats /dev/nvme0n1p2
[/dev/nvme0n1p2].write_io_errs    0
[/dev/nvme0n1p2].read_io_errs     0
[/dev/nvme0n1p2].flush_io_errs    0
[/dev/nvme0n1p2].corruption_errs  0
[/dev/nvme0n1p2].generation_errs  0

# btrfs device stats /dev/nvme1n1p2
[/dev/nvme1n1p2].write_io_errs    0
[/dev/nvme1n1p2].read_io_errs     0
[/dev/nvme1n1p2].flush_io_errs    0
[/dev/nvme1n1p2].corruption_errs  1
[/dev/nvme1n1p2].generation_errs  0

# sudo -u prom-node-exporter btrfs device stats /
ERROR: getting device info for / failed: Operation not permitted
# sudo -u prom-node-exporter btrfs device stats /dev/nvme0n1p2
ERROR: cannot open /dev/nvme0n1p2: Permission denied
ERROR: '/dev/nvme0n1p2' is not a mounted btrfs device
# sudo -u prom-node-exporter btrfs device stats /dev/nvme1n1p2
ERROR: cannot open /dev/nvme1n1p2: Permission denied
ERROR: '/dev/nvme1n1p2' is not a mounted btrfs device

@leth
Copy link
Contributor

leth commented Mar 10, 2023

The implementation here makes the same syscalls as btrfs device stats, so it'll need the same priviliges.

I was torn on what level to log at; the filesystem could be mounted in multple places, so it might not be a fatal error, and we fall back to the old metrics from /sys/fs anyway, and I didn't want to spam people's logs about a feature which they might find difficult to make work.

I never got around to joining the btrfs mailing list to feed back to them about the data in /sys/fs.
It would be ideal if you could recreate btrfs device stats using it, but today you can't.

@DaAwesomeP
Copy link
Contributor Author

The implementation here makes the same syscalls as btrfs device stats, so it'll need the same priviliges.

Are you able to use it without root? Or are you using a capability or additional group? What kernel version are you on (it may have been patched later)?

@leth
Copy link
Contributor

leth commented Mar 11, 2023

The error you're seeing comes from here:

fs, err := dennwc.Open(mountPath, true)
if err != nil {
// Failed to open this mount point, maybe we didn't have permission
// maybe we'll find another mount point for this FS later.
level.Debug(c.logger).Log(
"msg", "Error inspecting btrfs mountpoint",
"mountPoint", mountPath,
"err", err)
continue
}

The error is returned from call here:
https://github.com/dennwc/btrfs/blob/3097362dc07261a95e05ab5bd2494c343c9c0f38/btrfs.go#L31

OpenFile maps to the open syscall. I just had a small look at the man page, perhaps the O_PATH flag might reduce the permissions required?

I'll see if I can try it out

@leth
Copy link
Contributor

leth commented Mar 11, 2023

OpenFile maps to the open syscall. I just had a small look at the man page, perhaps the O_PATH flag might reduce the permissions required?

No luck unfortunately! The ioctl calls simply fail with "bad file descriptor".

@DaAwesomeP
Copy link
Contributor Author

I see, so the issue is that ioctl() needs a file descriptor of some sort, but for / we can effectively only give that to root or something with that capability. But that is strange because I would think this work work if the user has read permissions.

@leth What happens if you remove the syscall.O_NOATIME?

It's strange it has 0644 as the permissions argument. This should probably be 0755, although that theoretically shouldn't matter since O_CREATE is not passed.

@DaAwesomeP
Copy link
Contributor Author

DaAwesomeP commented Mar 11, 2023

OK, yes I think that is probably it (I have not tried this myself).

Unrelated project with the same issue: https://gitlab.gnome.org/GNOME/glib/-/issues/874#note_262355 (see last comment)

So this means that:

  1. If you have permission to read and write the directory, then you should open it with O_RDONLY|O_NOATIME. You are probably root or have permission in this case.
  2. If the above fails, then fall back to the last case below:
  3. If you do no have permission to read and write, then you should open it with O_RDONLY only.

@DaAwesomeP
Copy link
Contributor Author

Further info:

O_NOATIME (since Linux 2.6.8)
       Do not update the file last access time (st_atime in the
       inode) when the file is [read(2)](https://man7.org/linux/man-pages/man2/read.2.html).

       This flag can be employed only if one of the following
        conditions is true:

       *  The effective UID of the process matches the owner UID
          of the file.

       *  The calling process has the CAP_FOWNER capability in
          its user namespace and the owner UID of the file has a
          mapping in the namespace.

https://man7.org/linux/man-pages/man2/openat.2.html

So falling back to without O_NOATIME seems to be the best option.

@leth
Copy link
Contributor

leth commented Mar 11, 2023 via email

@leth
Copy link
Contributor

leth commented Mar 12, 2023

It works, thank you so much for the help! I'll raise the PR on https://github.com/dennwc/btrfs to get this updated :)

@DaAwesomeP
Copy link
Contributor Author

@leth Amazing! We just need to update deps here and the issue is solved?

@leth
Copy link
Contributor

leth commented Mar 12, 2023

Yeah, though you may have to wait for that PR to get merged and the next node_exporter release too (unless you're building from source), I don't really know how frequent they are!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants