Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloudwatch): support anomaly detection alarms #19956

Closed

Conversation

brandondahler
Copy link

  • Allows customers to create Alarms with anomaly detection enabled.
  • Supports both direct metrics and math expressions.

Closes #10540

Implementation Notes

  • TreatMissingData has to be moved from alarm.ts to metric.ts as it not longer is specific to only Alarm.
  • Deprecated anomaly-based comparison operators from Alarm's ComparisonOperator enum.
    • Created a new AnomalyDetectionComparisonOperator enum to contain the anomaly-based comparison operators.
  • AnomalyDetectionAlarm is mostly copied from Alarm, main change required was to update how renderMetric works.
    • Removed horizontalAnnotation behavior as I don't believe there's a direct equivalent for anomaly detection at the moment.
    • The copying did duplicate some utility functions, wasn't sure if it was worth moving into the private/metric-util file or not.

All Submissions:

Adding new Unconventional Dependencies:

  • This PR adds new unconventional dependencies following the process described here

New Features

  • Have you added the new feature to an integration test?
    • Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Apr 18, 2022

@aws-cdk-automation aws-cdk-automation requested a review from a team April 18, 2022 15:13
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Apr 18, 2022
@kaizencc kaizencc added feature-request A feature should be added or improved. and removed bug This issue is a bug. labels Apr 20, 2022
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brandondahler! Thanks for picking this up. The README looks excellent, and I have a few preliminary comments regarding the implementation.

packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
return actionArn;
}

private renderMetric(metric: IMetric, threshold: number) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how much of this is copied directly from Alarm.renderMetric and how much is original?

Copy link
Author

@brandondahler brandondahler Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It originally was copied directly from Alarm.renderMetric.

I then:

  1. Added code to assign/generate an id for the anomaly band expression
  2. Added code to resolve the target metric's id, generating it earlier than originally if it isn't assigned in the case of a MathExpression
  3. Updated the returns of the callbacks to include the thresholdMetricId value that was assigned/generated in the output.
  4. Updated the returns of the callbacks for dispatchMetric to include the extra metric:
             {
               expression: `ANOMALY_DETECTION_BAND(${metricId}, ${threshold})`,
               id: thresholdMetricId,
               label: 'Expected',
               returnData: true,
             },
    

@brandondahler
Copy link
Author

I'll work through those suggestions soon.

FYSA - In general most of this code was copied from Alarm, so the naming and documentation comments are likely also applicable there. I'm fine with making them for this new construct but that will make this construct diverge from how the normal Alarm construct is named / documented.

@brandondahler brandondahler force-pushed the features/AnomalyDetectionAlarm branch from 4d5f00c to f0a98bd Compare April 22, 2022 18:51
@brandondahler
Copy link
Author

Review comments addressed, rebased on latest master branch.

@mergify mergify bot dismissed kaizencc’s stale review April 22, 2022 18:52

Pull request has been modified.

@brandondahler
Copy link
Author

brandondahler commented Apr 22, 2022

Looks like its just under the wire on branch coverage, I'll review what branch coverage I'm missing soon.

@brandondahler brandondahler force-pushed the features/AnomalyDetectionAlarm branch from f0a98bd to 0526376 Compare April 22, 2022 22:03
@brandondahler
Copy link
Author

Improved unit test coverage.

@brandondahler brandondahler force-pushed the features/AnomalyDetectionAlarm branch from 0526376 to a8edcff Compare April 22, 2022 23:26
@brandondahler
Copy link
Author

Cleaned up some nits around documentation that I found after self-review.

@pflorek
Copy link
Contributor

pflorek commented Apr 28, 2022

How is the usage of the AnomalyDetectionAlarm?

const metric = new Metric(...);
const anomalyDetector = new CfnAnomalyDetector(this, "AnomalyDetector", {
      metricName: invocationsMetric.metricName,
      namespace: invocationsMetric.namespace,
      stat: "Sum",
    });
const alarm = new AnomalyDetectionAlarm(this, "Alarm", {
    metric: metric,
    ...
});

An idea, does it make sense to construct CfnAnomalyDetector inside AnomalyDetectionAlarm for the given metric?

@brandondahler
Copy link
Author

My understanding from the documentation and testing is that CfnAnomalyDetector isn’t necessary for alarm usage.

That simplifies usage down to the same as a normal alarm.

const alarm = new AnomalyDetectionAlarm(this, "Alarm", {
    metric: metric,
    ...
});

The only significant difference is that threshold represents the number of standard deviations the band should use and the comparison operator determines whether it should use the upper, lower, or both thresholds.

@brandondahler
Copy link
Author

An idea, does it make sense to construct CfnAnomalyDetector inside AnomalyDetectionAlarm for the given metric?

I've looked a lot closer on how the documentation says things work around anomaly detection alarms and how that interacts with CfnAnomalyDetector (i.e. PutAnomalyDetector/DeleteAnomalyDetector).

I believe that effectively how CloudWatch with an anomaly detection alarm works is that each time it tries to evaluate the upper and lower bound, if the anomaly detector doesn't already exist (or was deleted) it'll call PutAnomalyDetector with the minimal information (i.e. the appropriate metric info, no extra configuration). Because of the auto-creation the usage of CfnAnomalyDetector isn't technically required.

Since the creation isn't technically required, the question becomes "Should we create/delete it because it makes things easier for customers in some manner?"

The documentation makes reference to deleting unused anomaly detectors due to them having a recurring cost; however, the pricing page makes no mention of ongoing cost of anomaly detector resources existing (only the +2 metric cost associated with running an alarm). I believe the documentation that references that cost is just out of date. Due to the lack of extra cost, I don't think it is necessary to manage creating and deleting the anomaly detectors that are otherwise auto-created by CloudWatch.

The only other consideration I found was that explicitly creating the anomaly detector allowed customers to exclude anomalous time ranges and configure the detector for a specific timezone (for DST purposes). The ability to configure the anomaly detector is an important feature that helps ensure that the anomaly detector doesn't train on bad data; however, I don't think configuring it within the AnomalyDetectionAlarm makes sense. My reasoning is that the AnomalyDetectionAlarm isn't explicitly tied to any anomaly detector instance -- the anomaly detector instance is a singleton per-metric/statistic. If we were to create/manage the anomaly detector instance within the AnomalyDetectionAlarm, separate alarms that both target the same metric+stat (e.g. with different standard deviation thresholds) may either fight between each other on which one's settings are the effective settings used or even worse the removal of one would silently wipe out the configuration from the other.

TL;DR - I don't think that AnomalyDetectionAlarm is the right place to manage CfnAnomalyDetector instances and instead we should separately make an AnomalyDetector L2 construct in a different PR (to allow customers to configure the detector as they see fit).

@brandondahler brandondahler force-pushed the features/AnomalyDetectionAlarm branch from a8edcff to 52d3f7e Compare April 29, 2022 20:31
@TheRealAmazonKendra TheRealAmazonKendra changed the base branch from v1-main to main July 13, 2022 00:06
@github-actions github-actions bot added bug This issue is a bug. and removed feature-request A feature should be added or improved. labels Jul 13, 2022
Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like our branch change to v2 caused a branch conflict. Can you please take a look at getting that resolved?

@brandondahler
Copy link
Author

Copy, I'll get it fixed when I get some time -- hopefully in the next week or two.

Allows customers to create Alarms with anomaly detection enabled.  Supports both direct metrics and math expressions.

Closes aws#10540
@brandondahler brandondahler force-pushed the features/AnomalyDetectionAlarm branch from 52d3f7e to 94d1368 Compare August 1, 2022 13:11
@brandondahler
Copy link
Author

Rebased onto latest main

@mergify mergify bot dismissed TheRealAmazonKendra’s stale review August 1, 2022 13:13

Pull request has been modified.

Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is just an initial passthrough, I didn't get too into the weeds at this point. Please address the inline comments and then I'll provide a more thorough review.

Also, thank you for the extensive unit testing here!

@@ -51,18 +51,24 @@ export enum ComparisonOperator {
/**
* Specified statistic is lower than or greater than the anomaly model band.
* Used only for alarms based on anomaly detection models
*
* @deprecated Use AnomalyDetectionAlarm instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to do this. Can you explain why you want to deprecate these when they're used more widely than just in this context?

/**
* Specify how missing data points are treated during alarm evaluation
*/
export enum TreatMissingData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this move makes sense. This is directly related to creating an alarm.

*
* @resource AWS::CloudWatch::Alarm
*/
export class AnomalyDetectionAlarm extends AlarmBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a lot of this is duplicating what's in the Alarm class. We should abstract the duplications out of both into a class that these can extend.

@TheRealAmazonKendra
Copy link
Contributor

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 1, 2022

update

✅ Branch has been successfully updated

@TheRealAmazonKendra TheRealAmazonKendra changed the title feat(cloudwatch) create AnomalyDetectionAlarm L2 construct feat(cloudwatch): support anomaly detection alarms Aug 1, 2022
@TheRealAmazonKendra
Copy link
Contributor

If/when you rebase without a code change, please use the Mergify commands instead of merging in other ways. It will keep my review active so I'll be able to more easily see when you've made updates to PR.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 9f08889
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@TheRealAmazonKendra
Copy link
Contributor

Our autoclose/autowarn action doesn't seem to be picking this PR up. It would normally warn after 21 days and then close after 28. Since this has been dormant this long, I'm going to go ahead and close this. If you would like to continue working on it, we certainly welcome that. Please open a new PR addressing the last round of feed back and we'll re-review.

@Lykr
Copy link

Lykr commented Mar 3, 2023

Hi, is there any update to this issue? Do we plan to provide user a more convenient way to create anomaly detection alarms?

@IhebMarnaoui
Copy link

Do we already have some clear way on how to create anomaly detection alarms now or has this never been picked up ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws-cloudwatch] Missing ThresholdMetricId
7 participants