Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add namespace for process metric attributes #3431

Closed
wants to merge 9 commits into from

Conversation

trask
Copy link
Member

@trask trask commented Apr 21, 2023

Part of #3131

Note: I'm not sure what the best approach is here, so took a stab at it to see what this would look like.

(also note that my ulterior motive for pushing this forward is because we have similar questions in JVM runtime metrics, and so want to get community alignment on this issue)

Changes

Adds namespace for process metric attributes:

  • process.cpu.time and process.cpu.utilization metrics
    • state renamed to cpu.state
  • process.disk.io metric
    • direction renamed to disk.direction
  • process.network.io metric
    • direction renamed to network.direction
  • process.context_switches
    • type renamed to context_switch.type
  • process.paging.faults (should we consider renaming to process.memory.page_faults?)
    • type renamed to memory.page_fault.type

(I don't think schema transformations support this, since e.g. we don't want to rename the metric attribute state across all metrics, only the one for the process.cpu.time metric)

Summary of benefits

Metrics can be derived from logs.

raw measurements can be emitted via logs.

@trask trask marked this pull request as ready for review April 21, 2023 22:57
@trask trask requested review from a team April 21, 2023 22:57
CHANGELOG.md Outdated Show resolved Hide resolved
([#3390](https://github.com/open-telemetry/opentelemetry-specification/pull/3390))
- Rename `process.cpu.time` metric attribute `state` to `cpu.state`,
rename `process.cpu.utilization` metric attribute `state` to `cpu.state`,
rename `process.disk.io` metric attribute `direction` to `disk.direction`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not targeting this PR, I'll restate that disk direction is very weird, most operating systems model disks with read/write/control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand wanting the consistency. I, personally, view disk as "yet another type of network operation", thanks to years of SAN and DFS.

I think we SHOULD get together a semconv group to really look at these at some point, but unfortunately we have a ton of users right now.

Simple renames and namespacing, I think we can afford to do, especially given what this could provide us w/ Log -> Metric or Trace -> Metric generation.

Anything more semantically different, we need to be cautious about.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of this direction and I think the spirit is also well aligned with https://github.com/open-telemetry/oteps/blob/6b62348b280b82036e897781d49ac4f2c3a9d95e/text/0199-support-elastic-common-schema-in-opentelemetry.md.

Not a blocker for now - I think many of these attributes will be useful for other areas (e.g. for a different metrics stream, for logs and traces) so eventually they should be extracted to a common place.

@reyang
Copy link
Member

reyang commented Apr 21, 2023

@AlexanderWert I want to get your input regarding metrics attributes (dimensions).

@tigrannajaryan
Copy link
Member

This breaks hostmetrics receiver in the Collector, which is declared Beta and is likely one of the most used receivers in the Collector. FYI, @open-telemetry/collector-contrib-approvers @open-telemetry/collector-approvers

What is the justification for this change?

(I don't think schema transformations support this, since e.g. we don't want to rename the metric attribute state across all metrics, only the one for the process.cpu.time metric)

It does. You can use apply_to_metrics setting to limit attribute renaming to specific metrics.

@trask
Copy link
Member Author

trask commented Apr 24, 2023

What is the justification for this change?

being able to share the same attributes across traces, metrics and logs

see @jsuereth's #3342 (review):

We want to share attributes between metrics + other signals. This is to aide in telemetry correlation, one of OTEL's core missions. Why the disparity?

@tigrannajaryan
Copy link
Member

What is the justification for this change?

being able to share the same attributes across traces, metrics and logs

see @jsuereth's #3342 (review):

We want to share attributes between metrics + other signals. This is to aide in telemetry correlation, one of OTEL's core missions. Why the disparity?

Good point. I think we need to put some more thought into this this.

The purpose of namespacing is to avoid name conflicts.

Attributes that we define for only a particular metric are already in their own namespace, the name of the metric (e.g. state attribute of process.cpu.time metric). These attributes don't need any other help to ensure there are no conflicts. These attributes are also only used for metrics and never for other signals.

Generic attributes (e.g. http.status_code) that can be recorded on a span, on a log or on a metric need such namespacing.

I think we need to make it a rule that metric-only attributes defined for specific metrics don't need to be in a namespace. So, I would make this change only for attributes which we plan to also record for traces and logs or which are defined as general-purpose attributes which can be recorded on any metrics (and thus have higher probability of clashes with other such attributes). At a glance cpu.state does not make sense to me as a span or log attribute, but perhaps there is a use case for that?

@jsuereth thoughts?

@jack-berg
Copy link
Member

I think we need to make it a rule that metric-only attributes defined for specific metrics don't need to be in a namespace. I would make this change only for attributes which we plan to also record for traces and logs or which are defined as general-purpose attributes which can be recorded on any metrics (and thus have higher probability of clashes with other such attributes)

This seems like an appealing way to think about it. I wonder how this would work with code generation. Right now code generation produces one big class containing all the attributes for traces, metrics, and logs. If an attribute like cpu state were to be scoped to specific metrics process.cpu.time and process.cpu.utilization, we'd either need multiple copies of state, or some sort of documentation describing the different contexts in which it is used.

Something like:

public static final AttributeKey<String> PROCESS_CPU_TIME_STATE = stringKey("state");

public static final AttributeKey<String> PROCESS_CPU_UTILIZATION_STATE = stringKey("state");

Or:

/**
 * Used in multiple contexts:
 * <ul>
 *   <li>process.cpu.time: <description of meaning when used with process.cpu.time>
 *   <li>process.cpu.utilization:  <description of meaning when used with process.cpu.utilization>
 */
public static final AttributeKey<String> STATE = stringKey("state");

@trask
Copy link
Member Author

trask commented Apr 25, 2023

I agree that lots of metric attributes don't have a use on spans, but it seems like they could potentially be useful on logs, e.g.

@trask
Copy link
Member Author

trask commented Apr 25, 2023

(I don't think schema transformations support this, since e.g. we don't want to rename the metric attribute state across all metrics, only the one for the process.cpu.time metric)

It does. You can use apply_to_metrics setting to limit attribute renaming to specific metrics.

nice, I added this, thx

@AlexanderWert
Copy link
Member

From the metrics attributes mentioned above, the only one I see a valuable correlation between metrics and logs is direction for process.network.io (-> network.direction). So, it would make sense to add a namespace for that attribute.

However, if we want to align this with ECS' network.direction we need to make sure the recommended / potential values are the same.

In OTel direction SHOULD be one of: receive, transmit.

In ECS the valid values for network.direction are:

  • ingress (corresponds to OTel's receive)
  • egress (corresponds to OTel's transmit)
  • inbound
  • outbound
  • internal
  • external
  • unknown

With the following description:

Direction of the network traffic.

When mapping events from a host-based monitoring context, populate this field from the host’s point of view, using the values "ingress" or "egress".

When mapping events from a network or perimeter-based monitoring context, populate this field from the point of view of the network perimeter, using the values "inbound", "outbound", "internal" or "external".

Note that "internal" is not crossing perimeter boundaries, and is meant to describe communication between two hosts within the perimeter. Note also that "external" is meant to describe traffic between two hosts that are external to the perimeter. This could for example be useful for ISPs or VPN service providers.

Alternatively, we could think about adding receive/transmit to the list of ECS' values as values to be used in a process-based context

([#3390](https://github.com/open-telemetry/opentelemetry-specification/pull/3390))
- Rename `process.cpu.time` metric attribute `state` to `cpu.state`,
rename `process.cpu.utilization` metric attribute `state` to `cpu.state`,
rename `process.disk.io` metric attribute `direction` to `disk.direction`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand wanting the consistency. I, personally, view disk as "yet another type of network operation", thanks to years of SAN and DFS.

I think we SHOULD get together a semconv group to really look at these at some point, but unfortunately we have a ton of users right now.

Simple renames and namespacing, I think we can afford to do, especially given what this could provide us w/ Log -> Metric or Trace -> Metric generation.

Anything more semantically different, we need to be cautious about.

@bertysentry
Copy link
Contributor

TBH I don't like these changes.

As @tigrannajaryan mentioned, there is no possible conflict between attributes using the same name but for different metrics. Specifying cpu.state instead of just state for a process.cpu.usage metric doesn't add any information. Same goes for disk.direction vs direction for process.disk.io: we already know it's about the disk.

Also, metrics don't have the same hierarchical nature as logs and traces with inherited attributes and context. This justifies having different rules for metrics attributes.

Now, I hear the argument about correlation between spans, logs, and metrics. The goal is noble but adding too much constraints on attributes naming between traces and metrics is going to create a lot of discussion and tension between spec writers. There are cases where a span attribute will require a very long namespace, and a small rare use case where we may want to correlate that attribute with a metric, and then the metric should use the long namespace for the attribute name, causing migration issues, etc.

While consistency is absolutely necessary, correlation between metrics and traces cannot be ensured by semantic conventions. Correlation engines will always require some mapping mechanisms, and in the age of AIOps, enforcing identical attribute names everywhere doesn't sound necessary.

Sorry for the long rant, I just mean: can't we keep it as simple as possible, while ensuring consistency in terms, and rely on implementation for correlation and other niceties?

@reyang reyang added the area:semantic-conventions Related to semantic conventions label May 9, 2023
@reyang
Copy link
Member

reyang commented May 9, 2023

@trask heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment).

@arminru arminru added the spec:metrics Related to the specification/metrics directory label May 9, 2023
Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to indicate my support, even though I understand this PR will need to be re-opened against the new repo.

My reasons:

  • Its very convenient to have a global set of attribute keys which mean the same thing in all contexts. This isn't strictly necessary, since we can say metric attributes are scoped to the metric name, but its a nicety I'm reluctant to give up.
  • It will be hard to anticipate when an attribute will only be used in a metric (and not need a namespace) vs be used across signals (and need a namespace). Telemetry schemas would allow for renames if an attribute is initially limited to metrics and later is exposed in logs or traces, but IMO its better to not have to use telemetry schemas even if they could be used.
  • Common rules for attributes in all signals is simpler than having an exception just for metrics. I don't share the view that using non-namespaced attributes for metrics is simpler. Consistency is simplicity.

@PeterF778
Copy link

Since there are good arguments for namespacing metric attributes as well as against, how about adding this requirement:

For each metric, its attribute names must be unique even after stripping the namespacing prefix.

This will allow some flexibility with handling metric attributes by backends, and specifically UIs.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@trask
Copy link
Member Author

trask commented May 24, 2023

hey all, the https://github.com/orgs/open-telemetry/teams/technical-committee is reviewing this issue, and asked to create a summary of pros/cons. I have created open-telemetry/semantic-conventions#51 based on the comments here, please comment further over there if I missed anything, thx!

@trask trask closed this May 24, 2023
@trask trask deleted the process-metric-attrs branch May 24, 2023 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.