You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Raspberry Pi 5 8GB w/ BCM2712 (ARM Cortex-A76 CPU / VideoCore 7 GPU)
OS
Ubuntu 24.10
Kernel
6.11.0-1004-raspi
Other
MicroK8s 1.31, containerd://1.6.28
I'm sure support for the RPi or Broadcom SOCs is not a priority, but I was curious what it would take. I'm able to run Kepler successfully, but it's unable to find any power data so it's defaulting to estimates, which are off (so far) by ... +174%.
FYI - BTF is enabled in the newer kernel(s):
root@mk8s01:~# grep BTF /boot/config-6.11.0-1004-raspi
CONFIG_VIDEO_SONY_BTF_MPX=m
CONFIG_DEBUG_INFO_BTF=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
# CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set
CONFIG_PROBE_EVENTS_BTF_ARGS=y
Here is what the Pod logs show, and a breakdown below.
2024-10-29T14:56:52.499638365-07:00 stderr F I1029 21:56:52.490660 1 gpu.go:38] Trying to initialize GPU collector using dcgm
2024-10-29T14:56:52.50010469-07:00 stderr F W1029 21:56:52.490993 1 gpu_dcgm.go:104] There is no DCGM daemon running in the host: libdcgm.so not Found
2024-10-29T14:56:52.500135152-07:00 stderr F W1029 21:56:52.491042 1 gpu_dcgm.go:108] Could not start DCGM. Error: libdcgm.so not Found
2024-10-29T14:56:52.50014156-07:00 stderr F I1029 21:56:52.491051 1 gpu.go:45] Error initializing dcgm: not able to connect to DCGM: libdcgm.so not Found
2024-10-29T14:56:52.500145541-07:00 stderr F I1029 21:56:52.491058 1 gpu.go:38] Trying to initialize GPU collector using nvidia-nvml
2024-10-29T14:56:52.500150967-07:00 stderr F I1029 21:56:52.491186 1 gpu.go:45] Error initializing nvidia-nvml: failed to init nvml. ERROR_LIBRARY_NOT_FOUND
2024-10-29T14:56:52.500157226-07:00 stderr F I1029 21:56:52.491195 1 gpu.go:38] Trying to initialize GPU collector using dummy
2024-10-29T14:56:52.500162967-07:00 stderr F I1029 21:56:52.491202 1 gpu.go:42] Using dummy to obtain gpu power
2024-10-29T14:56:52.500167448-07:00 stderr F E1029 21:56:52.491984 1 utils.go:110] getCPUArch failure: open /sys/devices/cpu/caps/pmu_name: no such file or directory
2024-10-29T14:56:52.500171781-07:00 stderr F I1029 21:56:52.492257 1 exporter.go:100] Kepler running on version: v0.7.11
2024-10-29T14:56:52.500177615-07:00 stderr F I1029 21:56:52.492318 1 config.go:284] using gCgroup ID in the BPF program: true
2024-10-29T14:56:52.500181596-07:00 stderr F I1029 21:56:52.492363 1 config.go:286] kernel version: 6.11
2024-10-29T14:56:52.500186614-07:00 stderr F I1029 21:56:52.492411 1 config.go:311] The Idle power will be exposed. Are you running on Baremetal or using single VM per node?
2024-10-29T14:56:52.500200059-07:00 stderr F I1029 21:56:52.492458 1 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory
2024-10-29T14:56:52.500205966-07:00 stderr F I1029 21:56:52.492652 1 power.go:72] Unable to obtain power, use estimate method
2024-10-29T14:56:52.500210151-07:00 stderr F I1029 21:56:52.492664 1 redfish.go:169] failed to get redfish credential file path
2024-10-29T14:56:52.500215447-07:00 stderr F I1029 21:56:52.492692 1 acpi.go:71] Could not find any ACPI power meter path. Is it a VM?
2024-10-29T14:56:52.50022004-07:00 stderr F I1029 21:56:52.492700 1 power.go:73] using none to obtain power
2024-10-29T14:56:52.50022541-07:00 stderr F I1029 21:56:52.494010 1 exporter.go:89] Number of CPUs: 4
2024-10-29T14:56:53.008268232-07:00 stderr F I1029 21:56:53.007971 1 exporter.go:147] Initializing the GPU collector
2024-10-29T14:56:53.009346917-07:00 stderr F I1029 21:56:53.009147 1 watcher.go:68] Using in cluster k8s config
2024-10-29T14:56:53.110801996-07:00 stderr F I1029 21:56:53.110593 1 watcher.go:140] k8s APIserver watcher was started
2024-10-29T14:56:53.110834995-07:00 stderr F I1029 21:56:53.110679 1 prometheus_collector.go:95] Registered Container Prometheus metrics
2024-10-29T14:56:53.110867958-07:00 stderr F I1029 21:56:53.110719 1 prometheus_collector.go:100] Registered VM Prometheus metrics
2024-10-29T14:56:53.11089342-07:00 stderr F I1029 21:56:53.110835 1 prometheus_collector.go:104] Registered Node Prometheus metrics
2024-10-29T14:56:53.111772016-07:00 stderr F I1029 21:56:53.111692 1 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform Power
2024-10-29T14:56:53.11221649-07:00 stderr F I1029 21:56:53.111772 1 process_energy.go:115] Process feature names: [bpf_cpu_time_ms]
2024-10-29T14:56:53.112227453-07:00 stderr F I1029 21:56:53.111818 1 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power
2024-10-29T14:56:53.112232156-07:00 stderr F I1029 21:56:53.111830 1 process_energy.go:125] Process feature names: [bpf_cpu_time_ms bpf_cpu_time_ms bpf_cpu_time_ms gpu_compute_util]
2024-10-29T14:56:53.112339988-07:00 stderr F I1029 21:56:53.112273 1 node_platform_energy.go:52] Using the Regressor/AbsPower Power Model to estimate Node Platform Power
2024-10-29T14:56:53.112640927-07:00 stderr F I1029 21:56:53.112546 1 node_component_energy.go:56] Using the Regressor/AbsPower Power Model to estimate Node Component Power
2024-10-29T14:56:53.112835924-07:00 stderr F I1029 21:56:53.112735 1 exporter.go:201] starting to listen on 0.0.0.0:9102
2024-10-29T14:56:53.113477579-07:00 stderr F I1029 21:56:53.113218 1 exporter.go:215] Started Kepler in 621.004183ms
OpenAllMSR() references msrPath (/dev/cpu/%d/msr), which fails to open.
RAPL MSR is an Intel x86 deal ... so that won't work here. I checked modprobe just to be sure and the msr module is not available in the 6.11.0-1004-raspi kernel.
failed to get redfish credential file path
No Redfish support; tied to iDRAC / Dell / Intel.
Could not find any ACPI power meter path
ACPI is not supported on RPi, it uses the device tree instead.
So at this point, there is no power data available to Kepler.
RPi has a built-in tool called vcgencmd to measure power consumption in volts from the VideoCore GPU:
root@mk8s01:~# for id in core sdram_c sdram_i sdram_p; do \
echo -e "$id:\t$(vcgencmd measure_volts $id)"; \
done
core: volt=0.8347V
sdram_c: volt=0.6000V
sdram_i: volt=0.6000V
sdram_p: volt=1.1000V
I'm sure support for the RPi or Broadcom SOCs is not a priority, but I was curious what it would take. I'm able to run Kepler successfully, but it's unable to find any power data so it's defaulting to estimates, which are off (so far) by ... +174%.
FYI - BTF is enabled in the newer kernel(s):
... so this isn't a repeat of #1597.
Here is what the Pod logs show, and a breakdown below.
getCPUArch failure
node.go:
cpuArch()
is called, which callsgetCPUArchitecture()
.cpuArchOverride
is not set,getCPUArchitecture()
looks atruntime.GOARCH
; doesn't match "amd64" or "s390x", so callscpuPmuName()
.cpuPmuName()
fails because/sys/devices/cpu/caps/pmu_name
doesn't exist.cpuArch
, which fails with "getCPUArch failure" error.I was able to bypass this via
CPU_ARCH_OVERRIDE: "arm64"
in the deployment configuration.failed to open path /dev/cpu/0/msr
rapl_msr.go:
IsSystemCollectionSupported()
callsInitUnits()
.rapl_msr_util.go:
InitUnits()
callsOpenAllMSR()
.OpenAllMSR()
referencesmsrPath
(/dev/cpu/%d/msr
), which fails to open.RAPL MSR is an Intel x86 deal ... so that won't work here. I checked
modprobe
just to be sure and themsr
module is not available in the 6.11.0-1004-raspi kernel.failed to get redfish credential file path
No Redfish support; tied to iDRAC / Dell / Intel.
Could not find any ACPI power meter path
ACPI is not supported on RPi, it uses the device tree instead.
So at this point, there is no power data available to Kepler.
RPi has a built-in tool called
vcgencmd
to measure power consumption in volts from the VideoCore GPU:There is also:
There may be other ways to pull power info.
Is this worth the time/effort?
The text was updated successfully, but these errors were encountered: