-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric Math Alarms do not adhere to minMetricSamplesToAlarm #403
Comments
For anyone else facing this same issue I was able to hack together this solution. import { Alarm, CfnAlarm, MathExpression } from 'aws-cdk-lib/aws-cloudwatch';
import { MonitoringFacade } from 'cdk-monitoring-constructs';
// Hack the underlying CFN types to be mutable so we can modify them.
type MetricDataQueryProperty = {
-readonly [K in keyof CfnAlarm.MetricDataQueryProperty]: CfnAlarm.MetricDataQueryProperty[K];
};
type MetricStatProperty = {
-readonly [K in keyof CfnAlarm.MetricStatProperty]: CfnAlarm.MetricStatProperty[K];
};
export function fixMetricMathMinSamples(monitoringFacade: MonitoringFacade) {
// Find all minimum samples metric math alarms
const alarms = monitoringFacade.node.children
.filter((c) => c instanceof Alarm)
.map((alarm) => alarm as Alarm)
.filter((alarm) => alarm.node.id.includes('NoSamples') && alarm.metric instanceof MathExpression);
alarms.forEach((alarm) => {
const metrics = (alarm.node.defaultChild! as CfnAlarm).metrics! as MetricDataQueryProperty[];
// Convert Metric Math to use the MAX Sample Count
metrics.filter((m) => m.expression).forEach((m) => (m.expression = 'MAX(METRICS())'));
// Convert Metrics to using SampleCount
metrics.filter((m) => m.metricStat).forEach((m) => ((m.metricStat as MetricStatProperty).stat = 'SampleCount'));
});
} Would love to see the logic fixed in the library itself. |
This is where We create an additional alarm based on a sample count variation of the provided metric that's then used in a composite alarm. I'm not entirely sure how to best solve for this. Potentially the most straightforward naive solution would be to create a On the other hand,
Considering you provided the above hacky solution, did you have any opinion @akiesler ? |
I think the devil is in the details. Do we know how the function behaves differently and went what extra data is requested?
My own experience has been that using the |
Fixes #452 Currently, when using `minMetricSamplesToAlarm` the number of samples is evaluated for a different period than the main alarm. This makes monitoring sensitive to false positives as not every breaching datapoint must have sufficient number of samples (see #452 for more details). Moreover, the current approach for adjusting alarms to respect `minMetricSamplesToAlarm` is to create 2 extra alarms - one for `NoSamples` and one for a top-level composite. Each of these monitors incurs extra costs ($0.10 for `NoSamples` monitor and $0.50 for the Composite, see https://aws.amazon.com/cloudwatch/pricing/ for reference). This means that using `minMetricSamplesToAlarm` increases the cost from $0.10 per alarm to $0.70 per alarm ($0.60 of overhead!). It's possible to use Math Expression instead. Instead of adding separate alarm for `NoSamples`, we can model it a Sample Count metric, and instead of the Composite, we can use the MathExpression that conditionally emits the data based on the number of samples. The charge for Math Expression-based alarms is per metric in the Math Expression, so that comes down to $0.20 per alarm. That's a 70% cost improvement. Additionally, it reduces the overall number of alarms, effectively making it easier to fit your alarming in the CloudWatch quota and decluttering the UI. To avoid breaking any customers that rely on `minMetricSamplesToAlarm` generating alarms (e.g. #403), deprecating it and adding `minSampleCountToEvaluateDatapoint` with updated behaviour next to it. --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
Version
5.3.4
Steps and/or minimal code example to reproduce
Expected behavior
I would expect the corresponding "NoSamples" alarm to either check each metrics has the necessary number of samples or wrap the entire expression in
DATAPOINT_COUNT
to check for the correct sample count.Actual behavior
The same metric alarm is created because there is no statistic prop to update.
Other details
No response
The text was updated successfully, but these errors were encountered: