-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable dim cap to Metrics API #1244
Comments
@macrogreg @SergeyKanzhelev @Dmitry-Matveev @reyang @Dawgfan Please share comments/concerns. |
so basically you are adding two settings: |
@SergeyKanzhelev No. seriesCountLimit and valuesPerDimensionLimit are existing. There exists overload to take separate caps per name. Only new thing to be added per this proposal is |
After offline discussion I think it is a good solution to implement #1233 |
I believe you are underestimating the E2E-experience problem around the wrongness of Other data. It is a big mistake to offer this API as public. Above you are saying that the proposed approach is our only option, but it is not correct. None of our competitors that I know of use this kind of approach of dim-capping in custom metrics APIs – for good reason. I strongly urge you not to add any public APIs for this and solve the problem for standard metrics first, before committing to a back-compat burden that we are not sure about. I have previously shared this PR for this: #1243 Please consider using it as is or simplifying it further. |
@macrogreg See the below statement copied from above: This proposal is :
Pasting below is the comment for the
I am not proposing to change this. I am suggested to add another option to those customers who want to accept loss of dimension, and want to keep overall metric value accurate. Again - this is opt-in behavior. Returning false from the API is not okay for me, as by the time I know about this issue, I already lost my metric value. The proposed API is for those customer who prefer to keep tracking the metric value, while accepting loss of dimension. (There could be other dimension which can still be used).
|
Hey @cijothomas , The remark about the underestimation was just because I read the text as "yes, it's a problem, but we still wanna go ahead". So I only referred to to our opinions in regard to how critical that complication is, have no doubt that we all are clear on what it is about. Sorry if I did not express that well enough :) My point is mainly this: It is so bad that we should advise customer NOT to use this particular approach, rather than making it easy.
Please also note the challenge of taking a lock inside a pool in your particular approach.
For that, simply expose an overload of GetDataSeries that returns the details of the MultidimensionalPointResult, like in this simple PR: This approach follows the Core Fx design guidelines: |
Option1. Monitor the return value, and do something about it. There is no global guidance what to do. I can track another metric (as in your above example.). I could throw error?. I could log trace message. I could collapse operation name into "other" myself. (this is not really possible now, as metric already hit limit, and it wont take 'other' now. And I cannot pre-reserve "other" as well, as I don't know the values of other dimension ahead of time) Or I could do a lot of other things. I may do something different from my peer team. so there is no consistency. Option2. This will continue to track my metric. I lose visibility into operation name. I understand that. But my metric queries like "avg(Duration)", avg(Duration)where(RoleName-"WebRoleA"), avg(Duration)where(success=false)" - are still accurate. This is simple to reason about. I fully understand that if "avg(Duration) where(OperationName=="Other") exists for any time period, I should not use avg(Duration) where (OperationName =="Get values") etc. as this is incorrect data. My overall metrics are not affected just because of one dimension went rogue. ("Other" or "DimensionCapped" or "somethingelse" - it can be set by user themselves, as opposed to Metric system providing a constant. I am not against either.) I prefer option 2. There may be customers who prefer option 1. This is precisely why we offer options, and let customer pick what they feel comfortable. I myself hit the problem when fixing standard metrics. It can affect any customer. There are no public docs explaining the Metrics API, so I don't think we have many customers actively using this. But nonetheless, this is a problem. |
Allow me to explain differently. The following is my need: I want to ensure that my metric (DependencyDuration) is always accurate. I also want to ensure that my metric (DependencyDuration) with Success dimension is also always accurate. I would like to split/filter by my DependencyTarget in most cases. My application is not expected to go above 25 always, but it can at certain short periods of time. I am okay to lose visibility into DependencyTarget on those occasions. Options I have:
In short, I want to track a single metric with multiple dimensions, and ensure values are always accurate for certain important dimensions. Some dimension may go rogue, but I don't want that to affect my overall metric/other dimensions. The proposal is to address this in SDK itself, so everyone gets to use same solution, accepting the consequences which are explained. |
@cijothomas It's difficult for me to follow the low-level details in this thread. At a high level, as a long-time Application Insights customer, I believe that the currently behavior of I also think that |
Thanks all for sharing your suggestion. |
Closing this as DimensionCapping is enabled as an internal API, and is used by standard metric extractor. Will open a new issue for public facing API for the same |
Problem Statement
The Metrics API currently does not offer a setting to apply dimension capping. Instead, when dimension values reach the limit, TrackValue/TryGetSeries returns false, and the value is not tracked. The expectation is that the caller is responsible for handling this situation.
Consider the following example with a metric "AnimalsSold" with 2 dimensions "Dimension1" and "Dimension". The config sets a limit of 2 to each dimensions.
There only recommended action for clients to do when the cap is hit is to revisit the dimension value and ensure their cardinality is limited. This prevents future issues, but the tracked values are lost, affecting Metric charts etc.
Because of this, the recommended approach of tracking metric involve checking the return value and doing something (like sending a telemetry) to indicate caps are hit.
Instead of losing data when a cap is hit, some users may prefer to instead keep tracking the value, but with a special dimension value to which all the dimension values encountered after cap is hit is rolled into. In the above case, the dimension value "Dim2Value3" could be replaced with something like "Other". This approach may sound an easy solution, but it severely affects data quality, as SDK Metric System is only aware of dimension values from its current execution. Thus values rolled into "Other" will vary across application instances, across service instances or even across restarts of the same instance. In the above example, in one instance "Dim2Value3" may be rolled into "Other", but in another instance, "Dim2Value2" could be the one rolled into "Other".
When querying for this metric, it is impossible to determine what constitutes "Other". In short, if any dimension has "Other", then the metric value should not be filtered/split by that dimension.
Proposal:
Even though rolling of dimension values into a special value "Other" also has its own limitations, this may be the only approach many users want to take as it is simple to use and reason about. With proper usage of dimension values, the probability of ever hitting this is low.
Because of this, we are adding a new feature to the current metric API to do this dimension capping. At 1st stage, this will be exposed as a boolean field on
MetricConfiguration
, namedApplyDimensionCapping
, and will befalse
by default, to keep current behavior. If this flag is turned on, then calls toTrackValue
orTryGetDataSeries
does not return false when any dimension exceeds its cap. It continues to track the values to the metric. But all the values encountered after the cap is hit will be rolled into a special value "DIMENSION_CAPPED". When querying for metric, the value "DIMENSION_CAPPED" will be like every other dimension value and its up to user to interpret it. (and ignore it)ApplyDimensionCapping
should be used only as a 'graceful' safeguard when some dimension accidentally goes out of limits. While overall metric value will be accurate, values involving any dimension which has "DIMENSION_CAPPED" should be ignored. (i.e No filtering or splitting involving this dimension)example usage:
The text was updated successfully, but these errors were encountered: