Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update absolute power metrics when cannot determine idle/dynamic power #1653

Open
vimalk78 opened this issue Jul 31, 2024 · 0 comments
Open
Labels
kind/bug report bug issue

Comments

@vimalk78
Copy link
Collaborator

What happened?

Suppose a Node has ACPI, and from which power values are being read, so we have data for node platform power. But kepler somehow cannot access Rapl, so there is no component power. Which means we do not have resource usage to divide the node power read from ACPI into node platform idle and dynamic power.
which results in kepler_node_platform_joules_total for both idle and dynamic to be zero.

In this case can we raise metrics for absolute power? i.e. kepler_node_platform_joules_total {mode="absolute"}

What did you expect to happen?

raise metrics for absolute power? i.e. kepler_node_platform_joules_total {mode="absolute"}

How can we reproduce it (as minimally and precisely as possible)?

run kepler on a node with either ACPI or Redfish to read node power, and have no access to RAPL, check the kepler_node_platform_joules_total metric

The initial kepler logs should be similar to:

I0729 14:56:17.673625    8260 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory
I0729 14:56:17.674109    8260 power.go:72] Unable to obtain power, use estimate method
I0729 14:56:17.673568    8260 config.go:156] EXPOSE_ESTIMATED_IDLE_POWER_METRICS: false. This only impacts when the power is estimated using pre-prained models. Estimated idle power is meaningful only when Kepler is running on bare-metal or with a single virtual machine (VM) on the node.
W0729 14:56:18.037859    8260 exporter.go:299] Failed to open perf event for CPU cycles: failed to open bpf perf event on cpu 0: no such file or directory

Anything else we need to know?

No response

Kepler image tag

7.11

Kubernetes version

$ kubectl version
# paste output here

Use baremetal machine

Cloud provider or bare metal

Any

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Kepler deployment config

For on kubernetes:

$ KEPLER_NAMESPACE=kepler

# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE}
# paste output here

# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE}

For standalone:

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@vimalk78 vimalk78 added the kind/bug report bug issue label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug report bug issue
Projects
None yet
Development

No branches or pull requests

1 participant