Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix instrument types in system metrics #2865

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bjrara
Copy link

@bjrara bjrara commented Sep 11, 2024

Description

This PR fixes the instrument types when creating the system metrics.

Metrics that are impacted:

  • system.network.connections:
    Collected from psutil.net_connections is a snapshot of the current state of network connections, thus should be delta and is not monotonic.
  • process.runtime.cpython.gc_count
    Collected from python gc is the current collection counts, thus should be delta and is not monotonic.
  • process.runtime.cpython.memory
    Collected from psutil.Process.memory_info is a snapshot of the process’s memory usage at the time of the call, thus should be delta and is not monotonic.
  • process.runtime.cpython.thread_count
    Collected from psutil.Process.num_threads is the number of threads currently used by this process (non cumulative).
  • process.open_file_descriptor.count
    Collected from psutil.Process.num_fds is the number of file descriptors currently opened by this process (non cumulative).

Change the instrument types for the metrics above to observable_gauge for correction.

Fixes #2861

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Tested locally using console exporter configured with delta temporality for all the metric types.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@xrmx
Copy link
Contributor

xrmx commented Sep 12, 2024

Don't have much clue about metrics but the generated semantic conventions package provides some helper functions to creates these metrics, and they return UpDownCounter, e.g. create_process_open_file_descriptor_count

Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From specs:

I want to measure something (by reporting an absolute value):

If the measurement values are additive:

So I think the UpDownCounter is correct no?

@bjrara
Copy link
Author

bjrara commented Sep 12, 2024

@xrmx @emdneto Thanks for taking a look at the PR.

The issue is not with UpDownCounter, but the temporality used by ObservableUpDownCounter.

From the definition, ObservableUpDownCounter is cumulative, while these metrics are delta.

if isinstance(instrument, ObservableUpDownCounter):
        return _SumAggregation(
            attributes,
            instrument_is_monotonic=False,
            instrument_aggregation_temporality=(
                AggregationTemporality.CUMULATIVE
            ),
            start_time_unix_nano=start_time_unix_nano,
        )

Changing them from ObservableUpDownCounter to UpDownCounter should also fix the issue. I agree with both of you that using UpDownCounter is more accurate from the semantic convention. However, create_up_down_counter isn't asynchronous, and doesn't accept callbacks, so I choose to use ObservableGauge. If you have better suggestions on how to fix the issue properly with UpDownCounter, please let me know.

I actually want to take back my words on using UpDownCounter. By checking the specification of UpDownCounter, it's more appropriate to use when the metrics are emitted using add(n) or add(-n). But in our case, we collect absolute values from snapshots.

I understand it deviates from the semantic conventions, but I also want to ask for a second thought based on the actual collection process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple system metrics return negative values when using DELTA temporality.
3 participants