Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system.cpu.utilization values are always 0 in first export from PeriodicExportingMetricReader #2797

Open
alexmojaki opened this issue Aug 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@alexmojaki
Copy link
Contributor

alexmojaki commented Aug 14, 2024

What happened?

Combining PeriodicExportingMetricReader and SystemMetricsInstrumentor produces only zero values for system.cpu.utilization in the first export, which may be the only export if the process doesn't last long.

Steps to Reproduce

from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)
import time

exporter = ConsoleMetricExporter()
reader = PeriodicExportingMetricReader(exporter)
meter_provider = MeterProvider(metric_readers=[reader])
config = {"system.cpu.utilization": ["idle", "user", "system"]}
instrumentor = SystemMetricsInstrumentor(config=config)
instrumentor.instrument(meter_provider=meter_provider)

# Wait a bit to get realistic measurements
time.sleep(1)

Output:

{
  "resource_metrics": [
    {
      "resource": {
        "attributes": {
          "telemetry.sdk.language": "python",
          "telemetry.sdk.name": "opentelemetry",
          "telemetry.sdk.version": "1.26.0.dev0",
          "service.name": "unknown_service"
        },
        "schema_url": ""
      },
      "scope_metrics": [
        {
          "scope": {
            "name": "opentelemetry.instrumentation.system_metrics",
            "version": "0.48b0.dev",
            "schema_url": "https://opentelemetry.io/schemas/1.11.0"
          },
          "metrics": [
            {
              "name": "system.cpu.utilization",
              "description": "System CPU utilization",
              "unit": "1",
              "data": {
                "data_points": [
                  {
                    "attributes": {
                      "state": "idle",
                      "cpu": 1
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  {
                    "attributes": {
                      "state": "user",
                      "cpu": 1
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  {
                    "attributes": {
                      "state": "system",
                      "cpu": 1
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  {
                    "attributes": {
                      "state": "idle",
                      "cpu": 2
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  {
                    "attributes": {
                      "state": "user",
                      "cpu": 2
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  {
                    "attributes": {
                      "state": "system",
                      "cpu": 2
                    },
                    "start_time_unix_nano": 0,
                    "time_unix_nano": 1723642134243694000,
                    "value": 0.0
                  },
                  "... etc for every core ..."
                ]
              }
            }
          ],
          "schema_url": "https://opentelemetry.io/schemas/1.11.0"
        }
      ],
      "schema_url": ""
    }
  ]
}

Expected Result

Some values greater than 0.

Actual Result

"value": 0.0 for every data point.

Additional context

In https://psutil.readthedocs.io/en/latest/#psutil.cpu_times_percent it says:

Warning: the first time this function is called with interval = 0.0 or None it will return a meaningless 0.0 value which you are supposed to ignore.

which is where the problem comes from. However, the first time a user calls cpu_times_percent in the main thread, it actually returns a nonzero value. This is because psutil has this at the top level of the module:

_last_cpu_times = {threading.current_thread().ident: cpu_times()}

This means that when metrics are read on the main thread (e.g. by adding meter_provider.force_flush() to the script above, or by using InMemoryMetricReader) then the CPU values are nonzero.

The problem is that PeriodicExportingMetricReader naturally has to perform readings on a separate thread. From the same psutil docs:

Internally this function maintains a global map (a dict) where each key is the ID of the calling thread (threading.get_ident). This means it can be called from different threads, at different intervals, and still return meaningful and independent results.

Adding the following code to the script above anywhere between reader = ... and time.sleep seems to fix the problem:

import psutil

cpu_times = psutil.cpu_times_percent(percpu=True)
psutil._last_per_cpu_times_2[reader._daemon_thread.ident] = cpu_times

Actually implementing this fix in a sensible and maintainable way seems more complicated.

Would you like to implement a fix?

No

@alexmojaki alexmojaki added the bug Something isn't working label Aug 14, 2024
@alexmojaki
Copy link
Contributor Author

The same problem exists for process.runtime.cpu.utilization, but it's more easily fixable:

import time

from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)

exporter = ConsoleMetricExporter()
reader = PeriodicExportingMetricReader(exporter)
meter_provider = MeterProvider(metric_readers=[reader])
config = {'process.runtime.cpu.utilization': None}
instrumentor = SystemMetricsInstrumentor(config=config)
instrumentor.instrument(meter_provider=meter_provider)

# Uncomment to get nonzero values
# instrumentor._proc.cpu_percent()

# Use some CPU
sum(list(range(int(1e7))))

# Wait a bit to get realistic measurements
time.sleep(1)

Adding self._proc.cpu_percent() to the place where the instrument is created should fix it, because these values aren't tracked per thread. Maybe that'll change one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant