-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into sharing-saved-objects-phase-3.5
- Loading branch information
Showing
35 changed files
with
1,235 additions
and
254 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
253 changes: 253 additions & 0 deletions
253
docs/user/alerting/troubleshooting/alerting-common-issues.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,253 @@ | ||
[role="xpack"] | ||
[[alerting-common-issues]] | ||
=== Common Issues | ||
|
||
This page describes how to resolve common problems you might encounter with Alerting. | ||
|
||
[float] | ||
[[rules-small-check-interval-run-late]] | ||
==== Rules with small check intervals run late | ||
|
||
*Problem* | ||
|
||
Rules with a small check interval, such as every two seconds, run later than scheduled. | ||
|
||
*Solution* | ||
|
||
Rules run as background tasks at a cadence defined by their *check interval*. | ||
When a Rule *check interval* is smaller than the Task Manager <<task-manager-settings,`poll_interval`>>, the rule will run late. | ||
|
||
Either tweak the <<task-manager-settings,{kib} Task Manager settings>> or increase the *check interval* of the rules in question. | ||
|
||
For more details, see <<task-manager-health-scheduled-tasks-small-schedule-interval-run-late>>. | ||
|
||
|
||
[float] | ||
[[scheduled-rules-run-late]] | ||
==== Rules with the inconsistent cadence | ||
|
||
*Problem* | ||
|
||
Scheduled rules run at an inconsistent cadence, often running late. | ||
|
||
Actions run long after the status of a rule changes, sending a notification of the change too late. | ||
|
||
*Solution* | ||
|
||
Rules and actions run as background tasks by each {kib} instance at a default rate of ten tasks every three seconds. | ||
When diagnosing issues related to Alerting, focus on the tasks that begin with `alerting:` and `actions:`. | ||
|
||
Alerting tasks always begin with `alerting:`. For example, the `alerting:.index-threshold` tasks back the <<rule-type-index-threshold, index threshold stack rule>>. | ||
Action tasks always begin with `actions:`. For example, the `actions:.index` tasks back the <<index-action-type, index action>>. | ||
|
||
For more details on monitoring and diagnosing task execution in Task Manager, see <<task-manager-health-monitoring>>. | ||
|
||
[float] | ||
[[connector-tls-settings]] | ||
==== Connectors have TLS errors when executing actions | ||
|
||
*Problem* | ||
|
||
When executing actions, a connector gets a TLS socket error when connecting to | ||
the server. | ||
|
||
*Solution* | ||
|
||
Configuration options are available to specialize connections to TLS servers, | ||
including ignoring server certificate validation, and providing certificate | ||
authority data to verify servers using custom certificates. For more details, | ||
see <<action-settings,Action settings>>. | ||
|
||
[float] | ||
[[rules-long-execution-time]] | ||
==== Rules take a long time to run | ||
|
||
*Problem* | ||
|
||
Rules are taking a long time to execute and are impacting the overall health of your deployment. | ||
|
||
[IMPORTANT] | ||
============================================== | ||
By default, only users with a `superuser` role can query the {kib} event log because it is a system index. To enable additional users to execute this query, assign `read` privileges to the `.kibana-event-log*` index. | ||
============================================== | ||
|
||
*Solution* | ||
|
||
Query for a list of rule ids, bucketed by their execution times: | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET /.kibana-event-log*/_search | ||
{ | ||
"size": 0, | ||
"query": { | ||
"bool": { | ||
"filter": [ | ||
{ | ||
"range": { | ||
"@timestamp": { | ||
"gte": "now-1d", <1> | ||
"lte": "now" | ||
} | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"event.action": { | ||
"value": "execute" | ||
} | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"event.provider": { | ||
"value": "alerting" <2> | ||
} | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
"runtime_mappings": { <3> | ||
"event.duration_in_seconds": { | ||
"type": "double", | ||
"script": { | ||
"source": "emit(doc['event.duration'].value / 1E9)" | ||
} | ||
} | ||
}, | ||
"aggs": { | ||
"ruleIdsByExecutionDuration": { | ||
"histogram": { | ||
"field": "event.duration_in_seconds", | ||
"min_doc_count": 1, | ||
"interval": 1 <4> | ||
}, | ||
"aggs": { | ||
"ruleId": { | ||
"nested": { | ||
"path": "kibana.saved_objects" | ||
}, | ||
"aggs": { | ||
"ruleId": { | ||
"terms": { | ||
"field": "kibana.saved_objects.id", | ||
"size": 10 <5> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST | ||
|
||
<1> This queries for rules executed in the last day. Update the values of `lte` and `gte` to query over a different time range. | ||
<2> Use `event.provider: actions` to query for long-running action executions. | ||
<3> Execution durations are stored as nanoseconds. This adds a runtime field to convert that duration into seconds. | ||
<4> This interval buckets the `event.duration_in_seconds` runtime field into 1 second intervals. Update this value to change the granularity of the buckets. If you are unable to use runtime fields, make sure this aggregation targets `event.duration` and use nanoseconds for the interval. | ||
<5> This retrieves the top 10 rule ids for this duration interval. Update this value to retrieve more rule ids. | ||
|
||
This query returns the following: | ||
|
||
[source,json] | ||
-------------------------------------------------- | ||
{ | ||
"took" : 322, | ||
"timed_out" : false, | ||
"_shards" : { | ||
"total" : 1, | ||
"successful" : 1, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 326, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : null, | ||
"hits" : [ ] | ||
}, | ||
"aggregations" : { | ||
"ruleIdsByExecutionDuration" : { | ||
"buckets" : [ | ||
{ | ||
"key" : 0.0, <1> | ||
"doc_count" : 320, | ||
"ruleId" : { | ||
"doc_count" : 320, | ||
"ruleId" : { | ||
"doc_count_error_upper_bound" : 0, | ||
"sum_other_doc_count" : 0, | ||
"buckets" : [ | ||
{ | ||
"key" : "1923ada0-a8f3-11eb-a04b-13d723cdfdc5", | ||
"doc_count" : 140 | ||
}, | ||
{ | ||
"key" : "15415ecf-cdb0-4fef-950a-f824bd277fe4", | ||
"doc_count" : 130 | ||
}, | ||
{ | ||
"key" : "dceeb5d0-6b41-11eb-802b-85b0c1bc8ba2", | ||
"doc_count" : 50 | ||
} | ||
] | ||
} | ||
} | ||
}, | ||
{ | ||
"key" : 30.0, <2> | ||
"doc_count" : 6, | ||
"ruleId" : { | ||
"doc_count" : 6, | ||
"ruleId" : { | ||
"doc_count_error_upper_bound" : 0, | ||
"sum_other_doc_count" : 0, | ||
"buckets" : [ | ||
{ | ||
"key" : "41893910-6bca-11eb-9e0d-85d233e3ee35", | ||
"doc_count" : 6 | ||
} | ||
] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
<1> Most rule execution durations fall within the first bucket (0 - 1 seconds). | ||
<2> A single rule with id `41893910-6bca-11eb-9e0d-85d233e3ee35` took between 30 and 31 seconds to execute. | ||
|
||
Use the <<get-rule-api,Get Rule API>> to retrieve additional information about rules that take a long time to execute. | ||
|
||
[float] | ||
[[rule-cannot-decrypt-api-key]] | ||
=== Rule cannot decrypt apiKey | ||
|
||
*Problem*: | ||
|
||
The rule fails to execute and has an `Unable to decrypt attribute "apiKey"` error. | ||
|
||
*Solution*: | ||
|
||
This error happens when the `xpack.encryptedSavedObjects.encryptionKey` value used to create the rule does not match the value used during rule execution. Depending on the scenario, there are different ways to solve this problem: | ||
|
||
[cols="2*<"] | ||
|=== | ||
|
||
| If the value in `xpack.encryptedSavedObjects.encryptionKey` was manually changed, and the previous encryption key is still known. | ||
| Ensure any previous encryption key is included in the keys used for <<xpack-encryptedSavedObjects-keyRotation-decryptionOnlyKeys, decryption only>>. | ||
|
||
| If another {kib} instance with a different encryption key connects to the cluster. | ||
| The other {kib} instance might be trying to run the rule using a different encryption key than what the rule was created with. Ensure the encryption keys among all the {kib} instances are the same, and setting <<xpack-encryptedSavedObjects-keyRotation-decryptionOnlyKeys, decryption only keys>> for previously used encryption keys. | ||
|
||
| If other scenarios don't apply. | ||
| Generate a new API key for the rule by disabling then enabling the rule. | ||
|
||
|=== |
Oops, something went wrong.