You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recentry built my own clsuter by Talos Linux( https://www.talos.dev/ ) at home and I'm trying to keep an eye on its power consumption.
I found Kepler and installed it, but the Kepler exporter does not start and does not export the metrics.
How can I get the metrics?
Talos maintainer says that the /dev/cpu/0/msr is disabled for security reason.
The logs also show that Could not find any ACPI power meter path.
but I've found /sys/class/hwmon/hwmon{0,1,2,3}/temp1_input and could read the files.
Could these be used as an alternative by some configuration?
In addition, Talos enables Pod Security Addomission and its basline policy by default . So to disable it, I had set pod-security.kubernetes.io/enforce: privileged, but that didn't fix the problem.
Here is some information about this problem.
➜ talosctl get cpu
NODE NAMESPACE TYPE ID VERSION MANUFACTURER MODEL CORES THREADS
192.168.11.32 hardware Processor LGA1151 1 Intel(R) Corporation Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 6 12
➜ talosctl get ram
NODE NAMESPACE TYPE ID VERSION MANUFACTURER MODEL SIZEMIB
192.168.11.32 hardware MemoryModule ChannelA-DIMM1 1 Samsung M378A5244CB0-CTD 4096
192.168.11.32 hardware MemoryModule ChannelB-DIMM1 1 Samsung M378A5244CB0-CTD 4096
➜ talosctl ls /sys/class/hwmon/
NODE NAME
192.168.11.32 .
192.168.11.32 hwmon0
192.168.11.32 hwmon1
192.168.11.32 hwmon2
192.168.11.32 hwmon3
➜ talosctl read /sys/class/hwmon/hwmon0/temp1_input
27800
➜ talosctl read /sys/class/hwmon/hwmon1/temp1_input
24000
➜ talosctl read /sys/class/hwmon/hwmon2/temp1_input
21000
➜ talosctl read /sys/class/hwmon/hwmon3/temp1_input
31000
➜ kubectl -n obs-prometheus logs kepler-pn57x
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu0/online: no such file or directory
I0129 15:18:37.168036 1 exporter.go:103] Kepler running on version: v0.7.12-dirty
I0129 15:18:37.168126 1 config.go:293] using gCgroup ID in the BPF program: true
I0129 15:18:37.168159 1 config.go:295] kernel version: 6.12
I0129 15:18:37.168202 1 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory
I0129 15:18:37.168537 1 power.go:78] Unable to obtain power, use estimate method
I0129 15:18:37.168551 1 redfish.go:169] failed to get redfish credential file path
I0129 15:18:37.169474 1 acpi.go:71] Could not find any ACPI power meter path. Is it a VM?
I0129 15:18:37.169480 1 power.go:79] using none to obtain power
E0129 15:18:37.169488 1 accelerator.go:154] [DUMMY] doesn't contain GPU
E0129 15:18:37.169506 1 exporter.go:154] failed to init GPU accelerators: no devices found
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu0/online: no such file or directory
I0129 15:18:37.171435 1 exporter.go:84] Number of CPUs: 12
W0129 15:18:37.390337 1 exporter.go:135] failed to attach tp/writeback/writeback_dirty_page: reading file "/sys/kernel/tracing/events/writeback/writeback_dirty_page/id": open /sys/kernel/tracing/events/writeback/writeback_dirty_page/id: no such file or directory. Kepler will not collect page cache write events. This will affect the DRAM power model estimation on VMs.
I0129 15:18:37.395382 1 watcher.go:83] Using in cluster k8s config
I0129 15:18:37.495932 1 watcher.go:229] k8s APIserver watcher was started
I0129 15:18:37.517744 1 process_energy.go:129] Using the Ratio Power Model to estimate PROCESS_TOTAL Power
I0129 15:18:37.517789 1 process_energy.go:130] Feature names: [bpf_cpu_time_ms]
I0129 15:18:37.522379 1 process_energy.go:129] Using the Ratio Power Model to estimate PROCESS_COMPONENTS Power
I0129 15:18:37.522419 1 process_energy.go:130] Feature names: [bpf_cpu_time_ms bpf_cpu_time_ms bpf_cpu_time_ms gpu_compute_util]
I0129 15:18:37.541930 1 regressor.go:276] Created predictor linear for trainer: "SGDRegressorTrainer"
I0129 15:18:37.541968 1 model.go:125] Requesting for Machine Spec: &{genuineintel intel_core_i7_8700 12 1 7 4600 2}
I0129 15:18:37.542008 1 node_platform_energy.go:53] Using the Regressor/AbsPower Power Model to estimate Node Platform Power
I0129 15:18:37.560806 1 regressor.go:276] Created predictor linear for trainer: "SGDRegressorTrainer"
I0129 15:18:37.560839 1 regressor.go:276] Created predictor linear for trainer: "SGDRegressorTrainer"
I0129 15:18:37.560856 1 regressor.go:276] Created predictor linear for trainer: "SGDRegressorTrainer"
I0129 15:18:37.560881 1 regressor.go:276] Created predictor linear for trainer: "SGDRegressorTrainer"
I0129 15:18:37.560900 1 model.go:125] Requesting for Machine Spec: &{genuineintel intel_core_i7_8700 12 1 7 4600 2}
I0129 15:18:37.560952 1 node_component_energy.go:57] Using the Regressor/AbsPower Power Model to estimate Node Component Power
I0129 15:18:37.561225 1 prometheus_collector.go:95] Registered Container Prometheus metrics
I0129 15:18:37.561356 1 prometheus_collector.go:100] Registered VM Prometheus metrics
I0129 15:18:37.561403 1 prometheus_collector.go:104] Registered Node Prometheus metrics
I0129 15:18:37.563529 1 exporter.go:194] starting to listen on 0.0.0.0:9102
I0129 15:18:37.563745 1 exporter.go:208] Started Kepler in 395.842121ms
E0129 15:18:37.650029 1 watcher.go:189] parsing pod kepler-pn57x obs-prometheus ContainerStatuses issue : container kepler-exporter did not start yet status, InitContainerStatuses issue :<nil>, EphemeralContainerStatuses issue :<nil>
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I've recentry built my own clsuter by Talos Linux( https://www.talos.dev/ ) at home and I'm trying to keep an eye on its power consumption.
I found Kepler and installed it, but the Kepler exporter does not start and does not export the metrics.
How can I get the metrics?
The logs show that it can't access these files:
Talos maintainer says that the /dev/cpu/0/msr is disabled for security reason.
The logs also show that Could not find any ACPI power meter path.
but I've found /sys/class/hwmon/hwmon{0,1,2,3}/temp1_input and could read the files.
Could these be used as an alternative by some configuration?
In addition, Talos enables Pod Security Addomission and its basline policy by default . So to disable it, I had set pod-security.kubernetes.io/enforce: privileged, but that didn't fix the problem.
Here is some information about this problem.
Any help would be greatly appreciated!
Thanks,
Beta Was this translation helpful? Give feedback.
All reactions