Basic protection against the unintended cardinality explosion or similar problems #2700

srikanthccv · 2022-05-18T16:12:32Z

We should have a some basic checks in place to make sure the unintended programming mistakes that might lead to high memory usage and affect the main application performance. We may probably need to introduce some limits on number resulting metric streams. And how much time measurements are kept in memory before we give up and drop them with warning message for pull exporters. Related #874

nstawski · 2023-05-15T21:51:48Z

I would like to take this issue.

aabmass · 2023-06-29T15:55:18Z

This has been added to the spec (experimentally)! https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits

@nstawski have you had a chance to look into this at all?

lzchen · 2023-06-29T16:45:35Z

I have implemented this feature for OT Rust.
I would suggest splitting this PR into two parts:

(higher priority) Implement the default behavior with 2000 as the stream limit. As discussed in the Python SIG on 6/29, we decided to implement this with dropping the labels that exceed the limit and aggregating the overflow data point.
Implement the ability to configure the stream limit on MeterProvider level and View level.

Just as a note, (1) is a possible behavior breaking change. Users that create a large amount of data points per timeseries that exceed the limit will automatically have their labels dropped. As well, the spec does not define a specific algorithm to use for determining which labels to drop (which might be defined later). Therefore we should probably have some safeguards before we release this. We can:

Inform users that this is an experimental feature (similar to what we do with other experimental features that have stable signals) so this will be future algorithm-proof.
Hide the default behavior behind a feature flag or allow a configuration option for the algorithm (not ideal, I don't like the idea of additional configuration, and we also do not have anything behind feature flags currently).
Possibly just release this and handle issues that come up proactively.

nstawski · 2023-06-29T17:34:06Z

@aabmass @lzchen thank you, will start working on it today.

lzchen · 2023-08-17T16:21:02Z

@nstawski

Any updates on this?

nstawski · 2023-08-22T11:48:44Z

@lzchen was a bit stuck, reached out to Srikanth and got a response from him recently. Working on it, will ping you / create a pull soon.

srikanthccv added the feature-request label May 18, 2022

srikanthccv added sdk Affects the SDK package. metrics labels Sep 9, 2022

srikanthccv assigned nstawski May 16, 2023

This was referenced Oct 26, 2023

Ns 3201 dropped attributes count in exporters nstawski/ns-opentelemetry-python#2

Open

Basic protection against the unintended cardinality explosion #3486

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic protection against the unintended cardinality explosion or similar problems #2700

Basic protection against the unintended cardinality explosion or similar problems #2700

srikanthccv commented May 18, 2022

nstawski commented May 15, 2023

aabmass commented Jun 29, 2023

lzchen commented Jun 29, 2023

nstawski commented Jun 29, 2023

lzchen commented Aug 17, 2023

nstawski commented Aug 22, 2023

Basic protection against the unintended cardinality explosion or similar problems #2700

Basic protection against the unintended cardinality explosion or similar problems #2700

Comments

srikanthccv commented May 18, 2022

nstawski commented May 15, 2023

aabmass commented Jun 29, 2023

lzchen commented Jun 29, 2023

nstawski commented Jun 29, 2023

lzchen commented Aug 17, 2023

nstawski commented Aug 22, 2023