-
Notifications
You must be signed in to change notification settings - Fork 301
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into NR-89144-Apache-Hadoop
- Loading branch information
Showing
249 changed files
with
25,399 additions
and
6,377 deletions.
There are no files selected for viewing
31 changes: 31 additions & 0 deletions
31
alert-policies/active-directory/active-directory-replication-failures.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
name: Active Directory Replication Failures | ||
description: |+ | ||
This alert is triggered when the Attempt timestamp != the Success timestamp, indicating a failure in replication between domain contollers. | ||
type: STATIC | ||
nrql: | ||
query: "FROM activeDirectoryReplicationPartners SELECT count(*) FACET server, partner WHERE lastReplicationSuccess != lastReplicationAttempt" | ||
|
||
valueFunction: SINGLE_VALUE | ||
terms: | ||
- priority: CRITICAL | ||
operator: ABOVE | ||
threshold: 0 | ||
thresholdDuration: 120 | ||
thresholdOccurrences: ALL | ||
|
||
expiration: | ||
closeViolationsOnExpiration: false | ||
openViolationOnExpiration: false | ||
expirationDuration: null | ||
|
||
signal: | ||
aggregationDelay: 120 | ||
aggregationMethod: EVENT_FLOW | ||
aggregationTimer: null | ||
aggregationWindow: 60 | ||
fillOption: NONE | ||
fillValue: null | ||
slideBy: null | ||
|
||
violationTimeLimitSeconds: 86400 |
32 changes: 32 additions & 0 deletions
32
alert-policies/active-directory/active-directory-windows-services.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: Active Directory Windows Services | ||
description: |+ | ||
This alert is triggered when any of the targeted Windows Services are in a state other than "running". | ||
The scope of this alert is Windows Services using the 'label.primary_app = active_directory' decoration. | ||
type: STATIC | ||
nrql: | ||
query: "FROM Metric SELECT count(*) FACET hostname, entity.name WHERE metricName = 'windows_service_state' AND state != 'running' AND label.primary_app = 'active_directory'" | ||
|
||
valueFunction: SINGLE_VALUE | ||
terms: | ||
- priority: CRITICAL | ||
operator: ABOVE | ||
threshold: 0 | ||
thresholdDuration: 300 | ||
thresholdOccurrences: ALL | ||
|
||
expiration: | ||
closeViolationsOnExpiration: false | ||
openViolationOnExpiration: false | ||
expirationDuration: null | ||
|
||
signal: | ||
aggregationDelay: 120 | ||
aggregationMethod: EVENT_FLOW | ||
aggregationTimer: null | ||
aggregationWindow: 60 | ||
fillOption: NONE | ||
fillValue: null | ||
slideBy: null | ||
|
||
violationTimeLimitSeconds: 86400 |
27 changes: 27 additions & 0 deletions
27
alert-policies/amazon-timestream/MagneticStoreRejectedUploadSystemFailures.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High MagneticStoreRejectedUploadSystemFailures | ||
|
||
description: |+ | ||
This alert is triggered when the MagneticStoreRejectedUploadSystemFailures is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.timestream.MagneticStoreRejectedUploadSystemFailures`) as 'Query' FROM Metric" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
27 changes: 27 additions & 0 deletions
27
alert-policies/amazon-timestream/MagneticStoreRejectedUploadUserFailures.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High MagneticStoreRejectedUploadUserFailures | ||
|
||
description: |+ | ||
This alert is triggered when the MagneticStoreRejectedUploadUserFailures is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.timestream.MagneticStoreRejectedUploadUserFailures`) as 'Query' FROM Metric" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High SystemErrors | ||
|
||
description: |+ | ||
This alert is triggered when the system errors is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.timestream.SystemErrors`) as 'Query' FROM Metric" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High UserErrors | ||
|
||
description: |+ | ||
This alert is triggered when the user errors is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.timestream.UserErrors`) as 'Query' FROM Metric" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Name of the alert | ||
name: High Runtime System Errors | ||
|
||
# Description and details | ||
description: |+ | ||
This alert occurs when the number of system errors are more than 10 in 300sec | ||
# Type of alert | ||
type: STATIC | ||
|
||
# NRQL query | ||
nrql: | ||
query: "SELECT count(aws.lex.RuntimeSystemErrors) from Metric" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation; float value | ||
threshold: 10 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Adding a Warning threshold is optional | ||
- priority: WARNING | ||
operator: ABOVE | ||
threshold: 8 | ||
thresholdDuration: 300 | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Name of the alert | ||
name: Latency In Response | ||
|
||
# Description and details | ||
description: |+ | ||
The latency for successful requests between the time that the request was made and the response was passed back | ||
# Type of alert | ||
type: STATIC | ||
|
||
# NRQL query | ||
nrql: | ||
query: "SELECT average(aws.lex.RuntimeSuccessfulRequestLatency) from Metric " | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation; float value | ||
threshold: 0.9 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 300 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Adding a Warning threshold is optional | ||
- priority: WARNING | ||
operator: ABOVE | ||
threshold: 0.8 | ||
thresholdDuration: 300 | ||
thresholdOccurrences: ALL | ||
|
||
|
||
# OPTIONAL: URL of runbook to be sent with notification | ||
runbookUrl: | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High AsyncServerErrorCount | ||
|
||
description: |+ | ||
This alert is triggered when the Async Server Error Count is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.transcribe.AsyncServerErrorCount`) as 'Query' FROM Metric WHERE aws.Namespace = 'AWS/Transcribe'" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High AsyncUserErrorCount | ||
|
||
description: |+ | ||
This alert is triggered when the Async User Error Count is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.transcribe.AsyncUserErrorCount`) as 'Query' FROM Metric WHERE aws.Namespace = 'AWS/Transcribe'" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High SyncServerErrorCount | ||
|
||
description: |+ | ||
This alert is triggered when the Sync Server Error Count is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.transcribe.SyncServerErrorCount`) as 'Query' FROM Metric WHERE aws.Namespace = 'AWS/Transcribe'" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: High SyncUserErrorCount | ||
|
||
description: |+ | ||
This alert is triggered when the Sync User Error Count is above 100 in 10 minutes. | ||
type: STATIC | ||
nrql: | ||
query: "SELECT count(`aws.transcribe.SyncUserErrorCount`) as 'Query' FROM Metric WHERE aws.Namespace = 'AWS/Transcribe'" | ||
|
||
# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE) | ||
valueFunction: SINGLE_VALUE | ||
|
||
# List of Critical and Warning thresholds for the condition | ||
terms: | ||
- priority: CRITICAL | ||
# Operator used to compare against the threshold. | ||
operator: ABOVE | ||
# Value that triggers a violation | ||
threshold: 100 | ||
# Time in seconds; 120 - 3600 | ||
thresholdDuration: 600 | ||
# How many data points must be in violation for the duration | ||
thresholdOccurrences: ALL | ||
|
||
# Duration after which a violation automatically closes | ||
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day]) | ||
violationTimeLimitSeconds: 86400 |
Oops, something went wrong.