Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/count] Break down counts by custom dimensions #19369

Closed
hannahchan opened this issue Mar 7, 2023 · 11 comments
Closed

[connector/count] Break down counts by custom dimensions #19369

hannahchan opened this issue Mar 7, 2023 · 11 comments
Assignees
Labels

Comments

@hannahchan
Copy link

Component(s)

connector/count

Is your feature request related to a problem? Please describe.

The current implementation of the count connector doesn't allow us to breakdown a count metric by our own custom dimensions. Breaking down the count metrics will allow us to understand more about our telemetry.

I want to b able to specify a dimension such as an resource attribute named environment and then have count connector break out the trace.span.count for example by the values of the environment resource attribute. Below is an example of the time series I expect to be generated for example when I do this.

trace.span.count{service.name="myService" environment="prod"}
trace.span.count{service.name="myService" environment="staging"}
trace.span.count{service.name="myService" environment="dev"}

Some custom dimensions we want to specify include;

  • Environment
  • Receiver Type / Protocol / Format
  • Instrumentation Library

We would also like to see byte counts for logs, metrics and traces broken down by these dimensions.

Describe the solution you'd like

I would like to be able to specify the custom dimensions for the count connector in my collector's config and then have the count connector breakdown the counts according to my specified dimensions. I would also like to be able to enabled byte count metrics in my configuration as well.

Describe alternatives you've considered

We've considered building a processor that does this and emit the metrics to the collector's /metrics endpoint. Because you can order processors and have multiple of the same process in a pipeline, you can count the logs, metrics and traces at different stages of a pipeline.

Additional context

We have an established multi-region SaaS application that we are uplifting to use OpenTelemetry automatic instrumentation.

We know from experimentation in a test environment that the automatic instrumentation emits more telemetry than our previous instrumentation. However this doesn't help us estimate or understand the volume of telemetry that our application will emit in production upfront without incurring the full cost of transporting and storing that telemetry.

Our current thinking is to deploy and configure the OpenTelemetry collector in our production environment as a measuring device that can count the logs, metrics and traces that our application will emit and then discards that telemetry or pass it through to the next process.

@hannahchan hannahchan added enhancement New feature or request needs triage New item requiring triage labels Mar 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

This makes sense to me to an extent, but it's important that we keep in mind the extent to which data is already being dimensioned. Specifically, counts are already separated based on resource and scope. Therefore, some of the examples you've provided would not require any additional code.

service.name is typically a resource attribute, so we should already expect to see counts that are separated by this dimension. Similarly, Instrumentation library should be a scope-level attribute, so again I would expect counts to be already dimensioned accordingly.

We are not automatically dimensioning by lower level attributes (specifically span, span event, data point, and log record attributes), so it make sense to me that we could support these. What I am wondering is if we need to support anything other than attributes.

Here's how I'm imagining the config - does this look right to you?

count:
  spans:
    my.span.count:
      conditions:
        - 'resource.attributes["environment"] != nil'
      attributes: # span attributes to use as dimensions
        - environment
  spanevents:
    my.spanevent.count:
      conditions:
        - 'resource.attributes["environment"] != nil'
      attributes: # span event attributes to use as dimensions
        - environment
  logs:
    my.log.count:
      conditions:
        - 'resource.attributes["environment"] != nil'
      attributes: # log record attributes to use as dimensions
        - environment
  datapoints:
    my.datapoint.count:
      conditions:
        - 'resource.attributes["environment"] != nil'
      attributes: # data point attributes to use as dimensions
        - environment

  # metrics do not have attributes
  metrics:
    my.metric.count:

@djaglowski
Copy link
Member

djaglowski commented Mar 7, 2023

What is the behavior when an attribute that we are counting by does not exist? I can think of two options:

  1. Don't increment any count
  2. An autogenerated default value e.g. counts["_none"]++

@atoulme atoulme removed the needs triage New item requiring triage label Mar 7, 2023
@hannahchan
Copy link
Author

An autogenerated default value e.g. counts["_none"]++

My default thought around this is to emit the count without that dimension.

@hannahchan
Copy link
Author

What I am wondering is if we need to support anything other than attributes.

In my mind, it's only whatever is accessible in attributes. Can't think of a use case of anything otherwise. Your proposed configuration structure looks fine to me.

@djaglowski
Copy link
Member

An autogenerated default value e.g. counts["_none"]++

My default thought around this is to emit the count without that dimension.

I'm not sure this is in the spirit of our metrics data model. As I understand it, a data point should represent a "total" value, except where dimensions are specified. If we are emitting a count that omits an attribute, this implies that it is the total count observed regardless of the attribute value.

There is more conversation about this here. Although the spec issue is not resolved, it seems to me there is consensus that a single emitter of telemetry should not emit metrics with multiple sets of attributes.

{ "name": "my.log.count", "value": 2, "attributes:{ "direction": "up" } }
{ "name": "my.log.count", "value": 3, "attributes:{ "direction": "down" } }

// equivalent to above
{ "name": "my.log.count", "value": 5, "attributes:{ } }

// not equivalent to above, but generated from logs without "direction" attribute
{ "name": "my.log.count", "value": 1, "attributes:{ } }

@djaglowski
Copy link
Member

What is the behavior when at attribute that we are counting by does not exist? I can think of two options:

  1. Don't increment any count
  2. An autogenerated default value e.g. counts["_none"]++

Having thought about this more, I think the correct behavior is neither of these options. I suggest that we should provide an optional setting in the attributes configuration:

logs:
  my.log.count:
    conditions:
      - 'resource.attributes["environment"] != nil'
    attributes:
      - key: environment
        default: other # logs observed that did not contain the "environment" attribute

If the default is not specified, we should ignore data that does not contain the attribute. On the other hand, when it is specified, we can count and assign to an unambiguous value.

It could be argued that conditions can be used on filter the data appropriately to obtain a count for "other", e.g. generate a metric for resource.attributes["environment"] == nil. However, this would require that the count is associated with a different metric. It also requires a more complex configuration compared to the above:

logs:
  my.log.count:
    conditions:
      - 'resource.attributes["environment"] != nil'
    attributes:
      - environment
  my.log.missing_environment:
    conditions:
      - 'resource.attributes["environment"] == nil'

@djaglowski
Copy link
Member

I'm going to assign this to myself and will try to have a PR soon.

@djaglowski djaglowski self-assigned this Mar 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 8, 2023
@hannahchan
Copy link
Author

This is still relevant to me. I've been on leave and haven't keep up-to-date regarding development to this feature but I can see some changes in code. I'll try to find time to explore this again for myself.

@djaglowski
Copy link
Member

I believe this is effectively resolved by #19432. @hannahchan, thanks for taking an interest in this. Please let me know if you still need additional enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants