Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Support for Raspberry Pi 5 / BCM2712 (ARM Cortex-A76 CPU / VideoCore 7 GPU)? #1831

Open
skpaz opened this issue Oct 31, 2024 · 0 comments

Comments

@skpaz
Copy link

skpaz commented Oct 31, 2024

Key Value
Host Raspberry Pi 5 8GB w/ BCM2712 (ARM Cortex-A76 CPU / VideoCore 7 GPU)
OS Ubuntu 24.10
Kernel 6.11.0-1004-raspi
Other MicroK8s 1.31, containerd://1.6.28

I'm sure support for the RPi or Broadcom SOCs is not a priority, but I was curious what it would take. I'm able to run Kepler successfully, but it's unable to find any power data so it's defaulting to estimates, which are off (so far) by ... +174%.

FYI - BTF is enabled in the newer kernel(s):

root@mk8s01:~# grep BTF /boot/config-6.11.0-1004-raspi 
CONFIG_VIDEO_SONY_BTF_MPX=m
CONFIG_DEBUG_INFO_BTF=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
# CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set
CONFIG_PROBE_EVENTS_BTF_ARGS=y

... so this isn't a repeat of #1597.

Here is what the Pod logs show, and a breakdown below.

2024-10-29T14:56:52.499638365-07:00 stderr F I1029 21:56:52.490660    1 gpu.go:38] Trying to initialize GPU collector using dcgm
2024-10-29T14:56:52.50010469-07:00 stderr F W1029 21:56:52.490993     1 gpu_dcgm.go:104] There is no DCGM daemon running in the host: libdcgm.so not Found
2024-10-29T14:56:52.500135152-07:00 stderr F W1029 21:56:52.491042    1 gpu_dcgm.go:108] Could not start DCGM. Error: libdcgm.so not Found
2024-10-29T14:56:52.50014156-07:00 stderr F I1029 21:56:52.491051     1 gpu.go:45] Error initializing dcgm: not able to connect to DCGM: libdcgm.so not Found
2024-10-29T14:56:52.500145541-07:00 stderr F I1029 21:56:52.491058    1 gpu.go:38] Trying to initialize GPU collector using nvidia-nvml
2024-10-29T14:56:52.500150967-07:00 stderr F I1029 21:56:52.491186    1 gpu.go:45] Error initializing nvidia-nvml: failed to init nvml. ERROR_LIBRARY_NOT_FOUND
2024-10-29T14:56:52.500157226-07:00 stderr F I1029 21:56:52.491195    1 gpu.go:38] Trying to initialize GPU collector using dummy
2024-10-29T14:56:52.500162967-07:00 stderr F I1029 21:56:52.491202    1 gpu.go:42] Using dummy to obtain gpu power
2024-10-29T14:56:52.500167448-07:00 stderr F E1029 21:56:52.491984    1 utils.go:110] getCPUArch failure: open /sys/devices/cpu/caps/pmu_name: no such file or directory
2024-10-29T14:56:52.500171781-07:00 stderr F I1029 21:56:52.492257    1 exporter.go:100] Kepler running on version: v0.7.11
2024-10-29T14:56:52.500177615-07:00 stderr F I1029 21:56:52.492318    1 config.go:284] using gCgroup ID in the BPF program: true
2024-10-29T14:56:52.500181596-07:00 stderr F I1029 21:56:52.492363    1 config.go:286] kernel version: 6.11
2024-10-29T14:56:52.500186614-07:00 stderr F I1029 21:56:52.492411    1 config.go:311] The Idle power will be exposed. Are you running on Baremetal or using single VM per node?
2024-10-29T14:56:52.500200059-07:00 stderr F I1029 21:56:52.492458    1 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory
2024-10-29T14:56:52.500205966-07:00 stderr F I1029 21:56:52.492652    1 power.go:72] Unable to obtain power, use estimate method
2024-10-29T14:56:52.500210151-07:00 stderr F I1029 21:56:52.492664    1 redfish.go:169] failed to get redfish credential file path
2024-10-29T14:56:52.500215447-07:00 stderr F I1029 21:56:52.492692    1 acpi.go:71] Could not find any ACPI power meter path. Is it a VM?
2024-10-29T14:56:52.50022004-07:00 stderr F I1029 21:56:52.492700     1 power.go:73] using none to obtain power
2024-10-29T14:56:52.50022541-07:00 stderr F I1029 21:56:52.494010     1 exporter.go:89] Number of CPUs: 4
2024-10-29T14:56:53.008268232-07:00 stderr F I1029 21:56:53.007971    1 exporter.go:147] Initializing the GPU collector
2024-10-29T14:56:53.009346917-07:00 stderr F I1029 21:56:53.009147    1 watcher.go:68] Using in cluster k8s config
2024-10-29T14:56:53.110801996-07:00 stderr F I1029 21:56:53.110593    1 watcher.go:140] k8s APIserver watcher was started
2024-10-29T14:56:53.110834995-07:00 stderr F I1029 21:56:53.110679    1 prometheus_collector.go:95] Registered Container Prometheus metrics
2024-10-29T14:56:53.110867958-07:00 stderr F I1029 21:56:53.110719    1 prometheus_collector.go:100] Registered VM Prometheus metrics
2024-10-29T14:56:53.11089342-07:00 stderr F I1029 21:56:53.110835     1 prometheus_collector.go:104] Registered Node Prometheus metrics
2024-10-29T14:56:53.111772016-07:00 stderr F I1029 21:56:53.111692    1 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform Power
2024-10-29T14:56:53.11221649-07:00 stderr F I1029 21:56:53.111772     1 process_energy.go:115] Process feature names: [bpf_cpu_time_ms]
2024-10-29T14:56:53.112227453-07:00 stderr F I1029 21:56:53.111818    1 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power
2024-10-29T14:56:53.112232156-07:00 stderr F I1029 21:56:53.111830    1 process_energy.go:125] Process feature names: [bpf_cpu_time_ms bpf_cpu_time_ms bpf_cpu_time_ms   gpu_compute_util]
2024-10-29T14:56:53.112339988-07:00 stderr F I1029 21:56:53.112273    1 node_platform_energy.go:52] Using the Regressor/AbsPower Power Model to estimate Node Platform Power
2024-10-29T14:56:53.112640927-07:00 stderr F I1029 21:56:53.112546    1 node_component_energy.go:56] Using the Regressor/AbsPower Power Model to estimate Node Component Power
2024-10-29T14:56:53.112835924-07:00 stderr F I1029 21:56:53.112735    1 exporter.go:201] starting to listen on 0.0.0.0:9102
2024-10-29T14:56:53.113477579-07:00 stderr F I1029 21:56:53.113218    1 exporter.go:215] Started Kepler in 621.004183ms

getCPUArch failure

node.go:

  1. cpuArch() is called, which calls getCPUArchitecture().
  2. If cpuArchOverride is not set, getCPUArchitecture() looks at runtime.GOARCH; doesn't match "amd64" or "s390x", so calls cpuPmuName().
  3. cpuPmuName() fails because /sys/devices/cpu/caps/pmu_name doesn't exist.
  4. Kicks back up to cpuArch, which fails with "getCPUArch failure" error.

I was able to bypass this via CPU_ARCH_OVERRIDE: "arm64" in the deployment configuration.

failed to open path /dev/cpu/0/msr

rapl_msr.go:

  1. IsSystemCollectionSupported() calls InitUnits().

rapl_msr_util.go:

  1. InitUnits() calls OpenAllMSR().
  2. OpenAllMSR() references msrPath (/dev/cpu/%d/msr), which fails to open.

RAPL MSR is an Intel x86 deal ... so that won't work here. I checked modprobe just to be sure and the msr module is not available in the 6.11.0-1004-raspi kernel.

failed to get redfish credential file path

No Redfish support; tied to iDRAC / Dell / Intel.

Could not find any ACPI power meter path

ACPI is not supported on RPi, it uses the device tree instead.


So at this point, there is no power data available to Kepler.

RPi has a built-in tool called vcgencmd to measure power consumption in volts from the VideoCore GPU:

root@mk8s01:~# for id in core sdram_c sdram_i sdram_p; do \
  echo -e "$id:\t$(vcgencmd measure_volts $id)"; \
done
core:	volt=0.8347V
sdram_c:	volt=0.6000V
sdram_i:	volt=0.6000V
sdram_p:	volt=1.1000V

There is also:

root@mk8s01:~# vcgencmd pmic_read_adc
 3V7_WL_SW_A current(0)=0.00000000A
   3V3_SYS_A current(1)=0.17371550A
   1V8_SYS_A current(2)=0.24886210A
  DDR_VDD2_A current(3)=0.00487965A
  DDR_VDDQ_A current(4)=0.00000000A
   1V1_SYS_A current(5)=0.34840700A
    0V8_SW_A current(6)=0.44892780A
  VDD_CORE_A current(7)=1.04481000A
   3V3_DAC_A current(17)=0.00006105A
   3V3_ADC_A current(18)=0.00024420A
   0V8_AON_A current(16)=0.00451770A
      HDMI_A current(22)=0.01221000A
 3V7_WL_SW_V volt(8)=3.70441600V
   3V3_SYS_V volt(9)=3.30260900V
   1V8_SYS_V volt(10)=1.79780000V
  DDR_VDD2_V volt(11)=1.10183000V
  DDR_VDDQ_V volt(12)=0.59890050V
   1V1_SYS_V volt(13)=1.10256300V
    0V8_SW_V volt(14)=0.79816780V
  VDD_CORE_V volt(15)=0.83513960V
   3V3_DAC_V volt(20)=3.31043600V
   3V3_ADC_V volt(21)=3.30677300V
   0V8_AON_V volt(19)=0.79970620V
      HDMI_V volt(23)=5.13890000V
     EXT5V_V volt(24)=5.14828000V
      BATT_V volt(25)=0.00000000V

There may be other ways to pull power info.

Is this worth the time/effort?

@skpaz skpaz closed this as completed Oct 31, 2024
@skpaz skpaz closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2024
@skpaz skpaz reopened this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant