Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/spanmetrics] Discard counter span metric exemplars after flushing #32210

Merged

Conversation

swar8080
Copy link
Contributor

@swar8080 swar8080 commented Apr 6, 2024

Description:
Discard counter span metric exemplars after flushing to avoid unbounded memory growth when exemplars are enabled.

This is needed because #28671 added exemplars to counter span metrics, but they are not removed after each flush interval like they are for histogram span metrics.

Note: this may change behaviour if using the undocumented exemplars.max_per_data_point configuration option, since exemplars would no longer be accumulated up until that count. However, i'm unclear on the value of that feature since there's no mechanism to replace old exemplars with newer ones once the maximum is reached. Maybe a follow-up enhancement is only discarding exemplars once the maximum is reached, or using a circular buffer to replace them. That could be useful for pull-based exporters like prometheusexporter, as retaining exemplars for longer would decrease the chance of them getting discarded before being scraped.

Link to tracking Issue:

Closes #31683

Testing:

  • Unit tests
  • Running the collector and setting a breakpoint to verify the exemplars are being cleared in-between flushes. Before the change I could see the exemplar count continually growing

Documentation:
Updated the documentation to mention that exemplars are added to all span metrics. Also mentioned when they are discarded

@swar8080 swar8080 requested review from a team and evan-bradley April 6, 2024 19:21
@github-actions github-actions bot requested review from Frapschen and portertech April 6, 2024 19:22
@swar8080 swar8080 closed this Apr 6, 2024
@swar8080 swar8080 reopened this Apr 6, 2024
Copy link

codecov bot commented Apr 6, 2024

Codecov Report

Attention: Patch coverage is 44.44444% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 81.99%. Comparing base (191c2c9) to head (212cbf3).

Files Patch % Lines
...r/spanmetricsconnector/internal/metrics/metrics.go 0.00% 10 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #32210   +/-   ##
=======================================
  Coverage   81.98%   81.99%           
=======================================
  Files        1912     1912           
  Lines      173444   173443    -1     
=======================================
+ Hits       142197   142209   +12     
+ Misses      26920    26906   -14     
- Partials     4327     4328    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@swar8080 swar8080 force-pushed the span-metric-exemplar-memory-leak branch from 212cbf3 to a695ee0 Compare April 6, 2024 21:18
@swar8080 swar8080 changed the title Discard counter span metric exemplars after flushing [connector/spanmetrics] Discard counter span metric exemplars after flushing Apr 6, 2024
@Frapschen
Copy link
Contributor

I also agree with that:

only discarding exemplars once the maximum is reached

Copy link
Contributor

@Frapschen Frapschen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you @swar8080

}

m.metrics = make(map[Key]*exponentialHistogram)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No situation where we need to retain this functionality? This is the only bit that stands out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this was dead code

Reset(onlyExemplars bool) was only called with true as the parameter, so the function would always return early before reaching the code path I deleted

@evan-bradley evan-bradley merged commit 39bdda7 into open-telemetry:main Apr 16, 2024
142 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leaks for Sum metric exemplars
5 participants