Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic protection against the unintended cardinality explosion or similar problems #2700

Open
srikanthccv opened this issue May 18, 2022 · 6 comments · May be fixed by nstawski/ns-opentelemetry-python#2 or #3486
Assignees
Labels
feature-request metrics sdk Affects the SDK package.

Comments

@srikanthccv
Copy link
Member

We should have a some basic checks in place to make sure the unintended programming mistakes that might lead to high memory usage and affect the main application performance. We may probably need to introduce some limits on number resulting metric streams. And how much time measurements are kept in memory before we give up and drop them with warning message for pull exporters. Related #874

@srikanthccv srikanthccv added sdk Affects the SDK package. metrics labels Sep 9, 2022
@nstawski
Copy link
Contributor

I would like to take this issue.

@aabmass
Copy link
Member

aabmass commented Jun 29, 2023

This has been added to the spec (experimentally)! https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits

@nstawski have you had a chance to look into this at all?

@lzchen
Copy link
Contributor

lzchen commented Jun 29, 2023

I have implemented this feature for OT Rust.
I would suggest splitting this PR into two parts:

  1. (higher priority) Implement the default behavior with 2000 as the stream limit. As discussed in the Python SIG on 6/29, we decided to implement this with dropping the labels that exceed the limit and aggregating the overflow data point.
  2. Implement the ability to configure the stream limit on MeterProvider level and View level.

Just as a note, (1) is a possible behavior breaking change. Users that create a large amount of data points per timeseries that exceed the limit will automatically have their labels dropped. As well, the spec does not define a specific algorithm to use for determining which labels to drop (which might be defined later). Therefore we should probably have some safeguards before we release this. We can:

  1. Inform users that this is an experimental feature (similar to what we do with other experimental features that have stable signals) so this will be future algorithm-proof.
  2. Hide the default behavior behind a feature flag or allow a configuration option for the algorithm (not ideal, I don't like the idea of additional configuration, and we also do not have anything behind feature flags currently).
  3. Possibly just release this and handle issues that come up proactively.

@nstawski
Copy link
Contributor

@aabmass @lzchen thank you, will start working on it today.

@lzchen
Copy link
Contributor

lzchen commented Aug 17, 2023

@nstawski

Any updates on this?

@nstawski
Copy link
Contributor

@lzchen was a bit stuck, reached out to Srikanth and got a response from him recently. Working on it, will ping you / create a pull soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request metrics sdk Affects the SDK package.
Projects
None yet
4 participants