Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting][Docs] Adds Alerting & Task Manager Scalability Guidance & Health Monitoring #91171

Merged
merged 113 commits into from
Mar 4, 2021
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
c923f15
added troubleshooting to alerts prod considerations
gmmorris Feb 11, 2021
42f3aad
fixed links
gmmorris Feb 11, 2021
bf70427
fixed tm link
gmmorris Feb 11, 2021
c7db172
fixed tm link
gmmorris Feb 15, 2021
be4d424
added tm health api initial docs and linked fro malerting
gmmorris Feb 15, 2021
ab5379c
added tm long running task diagnosis
gmmorris Feb 17, 2021
ecd2f33
fixed headers
gmmorris Feb 17, 2021
61d7256
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Feb 17, 2021
c6549c8
fixed merge conflict
gmmorris Feb 17, 2021
3d92701
split tm docs
gmmorris Feb 17, 2021
cc2f8be
added workload
gmmorris Feb 17, 2021
da1834c
move TM into own section
gmmorris Feb 17, 2021
a3d577e
formating
gmmorris Feb 17, 2021
239384b
remvoetm from alerting
gmmorris Feb 17, 2021
ceca142
added tm section
gmmorris Feb 17, 2021
2bc26aa
tm getting started page
gmmorris Feb 18, 2021
5c94c32
tm prod cons
gmmorris Feb 18, 2021
2f487dc
scaling
gmmorris Feb 18, 2021
385d305
tweaked tm
gmmorris Feb 18, 2021
78cec2f
added scaling
gmmorris Feb 18, 2021
b243248
typo
gmmorris Feb 18, 2021
2d03471
tweaked alerting troiubleshooting
gmmorris Feb 19, 2021
2113afe
fixed grammer and typos
gmmorris Feb 19, 2021
d56f4c9
capitalise
gmmorris Feb 19, 2021
7c990df
removed duplication from developer documentaion
gmmorris Feb 19, 2021
94ffbeb
typos and broken down troubleshooting
gmmorris Feb 19, 2021
fa807e9
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Feb 19, 2021
7245f88
added api docs for tm
gmmorris Feb 19, 2021
979a362
broken troubleshooting down further
gmmorris Feb 19, 2021
c102c9e
added toc to troubleshooting
gmmorris Feb 19, 2021
bd39cab
Update docs/api/task-manager/health.asciidoc
gmmorris Feb 22, 2021
c30d2a8
Update docs/api/task-manager/health.asciidoc
gmmorris Feb 22, 2021
3621bf9
Update docs/developer/plugin-list.asciidoc
gmmorris Feb 22, 2021
da26e51
Merge branch 'master' into task-manager/docs-monitoring
kibanamachine Feb 22, 2021
24ccf37
typo
gmmorris Feb 22, 2021
f3d2e70
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Feb 22, 2021
c037175
plugin api
gmmorris Feb 22, 2021
e8aba63
ensure health monitoring api is always refered to in the same manner
gmmorris Feb 22, 2021
ad4fde6
added notes on delay
gmmorris Feb 23, 2021
759d1a6
Merge branch 'master' into task-manager/docs-monitoring
kibanamachine Feb 24, 2021
5b00ce1
improved scaling strategy step
gmmorris Feb 25, 2021
26bdc8d
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Feb 25, 2021
be72ebd
Update docs/developer/plugin-list.asciidoc
gmmorris Feb 25, 2021
ae46a41
Update docs/settings/task-manager-settings.asciidoc
gmmorris Feb 25, 2021
47d61cb
Update docs/user/alerting/alerting-production-considerations.asciidoc
gmmorris Feb 25, 2021
e2a4de4
Update docs/user/alerting/alerting-production-considerations.asciidoc
gmmorris Feb 25, 2021
43681a7
Update docs/user/alerting/alerting-production-considerations.asciidoc
gmmorris Feb 25, 2021
351e320
Update docs/user/alerting/alerting-production-considerations.asciidoc
gmmorris Feb 25, 2021
6d4112d
Update docs/user/alerting/alerting-production-considerations.asciidoc
gmmorris Feb 25, 2021
3aa5d14
Apply suggestions from code review
gmmorris Feb 25, 2021
36c4fd3
updated troubleshooting intro
gmmorris Feb 25, 2021
226deae
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Feb 25, 2021
86fd368
intorduce top level prod considerations
gmmorris Feb 25, 2021
bb0a88f
fixed links
gmmorris Feb 25, 2021
b15ab3b
avoid refreshing by using useCallback
gmmorris Feb 25, 2021
ec6cf81
plugin docs
gmmorris Feb 25, 2021
8fcd898
moved production into user
gmmorris Feb 25, 2021
3b2a941
fixed topics
gmmorris Feb 25, 2021
e2ff3e7
moved prod after conf sec
gmmorris Feb 25, 2021
dc48758
Update docs/user/production-considerations/alerting-production-consid…
gmmorris Feb 26, 2021
86893f1
Update docs/user/production-considerations/alerting-production-consid…
gmmorris Feb 26, 2021
49fd8dc
Update docs/user/production-considerations/alerting-production-consid…
gmmorris Feb 26, 2021
5747fc1
Update docs/user/production-considerations/alerting-production-consid…
gmmorris Feb 26, 2021
ca154d1
Apply suggestions from code review
gmmorris Feb 26, 2021
f6101fc
grammer
gmmorris Feb 26, 2021
61bd58c
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Feb 26, 2021
e00e6ba
rephrased
gmmorris Feb 26, 2021
b6cc900
include TM in titles
gmmorris Feb 26, 2021
f20d0c2
removed a few links
gmmorris Feb 26, 2021
efda54b
fixed formatting
gmmorris Feb 26, 2021
2a37fdb
move TM articles under TM
gmmorris Feb 26, 2021
576e704
Apply suggestions from code review
gmmorris Feb 26, 2021
e0109af
merge sentances into a single paragraph
gmmorris Feb 26, 2021
7fef967
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Feb 26, 2021
14607b4
fixed broken TM link
gmmorris Feb 26, 2021
202530d
fixed link
gmmorris Feb 26, 2021
957f0cb
Merge branch 'master' into task-manager/docs-monitoring
kibanamachine Feb 26, 2021
f0695bb
removed docs link
gmmorris Feb 26, 2021
b56cc35
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Feb 26, 2021
78fa97e
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Mar 1, 2021
764293c
updated plugin list
gmmorris Mar 1, 2021
ae3e373
fixed TM link
gmmorris Mar 1, 2021
adedebb
fixed TM link again
gmmorris Mar 1, 2021
2d6724a
improved prod con link from alerting
gmmorris Mar 1, 2021
6b21146
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Mar 2, 2021
1626e82
Merge branch 'master' into task-manager/docs-monitoring
kibanamachine Mar 2, 2021
96fd086
Merge branch 'master' into task-manager/docs-monitoring
kibanamachine Mar 2, 2021
e264533
typo
gmmorris Mar 2, 2021
0a85947
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Mar 2, 2021
05a7882
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
33d9147
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
8b93230
Apply suggestions from code review
gmmorris Mar 3, 2021
81dc625
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
b808512
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
531002f
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
301523f
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
22acd67
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
cc3d776
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
6a169fc
Update docs/user/production-considerations/task-manager-troubleshooti…
gmmorris Mar 3, 2021
467ba46
applied PR review comments
gmmorris Mar 3, 2021
a324f66
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Mar 3, 2021
4ea2c9e
Apply suggestions from code review
gmmorris Mar 3, 2021
3ea0a86
applied PR review formatting comments
gmmorris Mar 3, 2021
deb8160
Merge branch 'task-manager/docs-monitoring' of github.com:gmmorris/ki…
gmmorris Mar 3, 2021
b0c15d5
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Mar 3, 2021
f96cebe
boken down to bulltets
gmmorris Mar 3, 2021
92fe188
change question to statement
gmmorris Mar 3, 2021
bde118c
indent troubleshooting
gmmorris Mar 3, 2021
d934c8c
removed unneeded newline
gmmorris Mar 3, 2021
dc7ce41
Apply suggestions from code review
gmmorris Mar 4, 2021
f1b92ef
fixed space
gmmorris Mar 4, 2021
8fe9986
:wq!rge branch 'task-manager/docs-monitoring' of github.com:gmmorris/…
gmmorris Mar 4, 2021
6a3b8b3
Merge branch 'master' into task-manager/docs-monitoring
gmmorris Mar 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions docs/api/task-manager/health.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
[[task-manager-api-health]]
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
=== Get Task Manager health API
++++
<titleabbrev>Get Task Manager health</titleabbrev>
++++

Retrieve the health status of the {kib} Task Manager.

[[task-manager-api-health-request]]
==== Request

`GET <kibana host>:<port>/api/task_manager/_health`

[[task-manager-api-health-codes]]
==== Response code

`200`::
Indicates a successful call.

[[task-manager-api-health-example]]
==== Example

Retrieve the health status of the {kib} Task Manager:

[source,sh]
--------------------------------------------------
$ curl -X GET api/task_manager/_health
--------------------------------------------------
// KIBANA

The API returns the following:

[source,sh]
--------------------------------------------------
{
"id": "15415ecf-cdb0-4fef-950a-f824bd277fe4",
"timestamp": "2021-02-16T11:38:10.077Z",
"status": "OK",
"last_update": "2021-02-16T11:38:09.934Z",
"stats": {
"configuration": {
"timestamp": "2021-02-16T11:29:05.055Z",
"value": {
"request_capacity": 1000,
"max_poll_inactivity_cycles": 10,
"monitored_aggregated_stats_refresh_rate": 60000,
"monitored_stats_running_average_window": 50,
"monitored_task_execution_thresholds": {
"default": {
"error_threshold": 90,
"warn_threshold": 80
},
"custom": {}
},
"poll_interval": 3000,
"max_workers": 10
},
"status": "OK"
},
"runtime": {
"timestamp": "2021-02-16T11:38:09.934Z",
"value": {
"polling": {
"last_successful_poll": "2021-02-16T11:38:09.934Z",
"last_polling_delay": "2021-02-16T11:29:05.053Z",
"duration": {
"p50": 0,
"p90": 0,
"p95": 0,
"p99": 0
},
"claim_conflicts": {
"p50": 0,
"p90": 0,
"p95": 0,
"p99": 0
},
"claim_mismatches": {
"p50": 0,
"p90": 0,
"p95": 0,
"p99": 0
},
"result_frequency_percent_as_number": {
"Failed": 0,
"NoAvailableWorkers": 0,
"NoTasksClaimed": 0,
"RanOutOfCapacity": 0,
"RunningAtCapacity": 0,
"PoolFilled": 0
}
},
"drift": {
"p50": 0,
"p90": 0,
"p95": 0,
"p99": 0
},
"load": {
"p50": 0,
"p90": 0,
"p95": 0,
"p99": 0
},
"execution": {
"duration": {},
"result_frequency_percent_as_number": {}
}
},
"status": "OK"
},
"workload": {
"timestamp": "2021-02-16T11:38:05.826Z",
"value": {
"count": 26,
"task_types": {},
"schedule": [],
"overdue": 0,
"estimated_schedule_density": []
},
"status": "OK"
}
}
}
--------------------------------------------------

The health API response is described in <<making-sense-of-task-manager-health-stats>>.

The health monitoring API exposes three sections:

* `configuration` is described in detail under <<task-manager-health-evaluate-the-configuration>>
* `workload` is described in detail under <<task-manager-health-evaluate-the-workload>>
* `runtime` is described in detail under <<task-manager-health-evaluate-the-runtime>>
1 change: 1 addition & 0 deletions docs/developer/plugin-list.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,7 @@ routes, etc.

|{kib-repo}blob/{branch}/x-pack/plugins/task_manager/README.md[taskManager]
|The task manager is a generic system for running background tasks.
Documentation: https://www.elastic.co/guide/en/kibana/master/task-manager.html
gmmorris marked this conversation as resolved.
Show resolved Hide resolved


|{kib-repo}blob/{branch}/x-pack/plugins/telemetry_collection_xpack/README.md[telemetryCollectionXpack]
Expand Down
14 changes: 13 additions & 1 deletion docs/settings/task-manager-settings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ Task Manager runs background tasks by polling for work on an interval. You can

| `xpack.task_manager.max_workers`
| The maximum number of tasks that this Kibana instance will run simultaneously. Defaults to 10.

|===

[float]
[[task-manager-health-settings]]
==== Task Manager Health settings

Settings that configure the <<task-manager-health-monitoring>> endpoint.

[cols="2*<"]
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
|===
| `xpack.task_manager.`
`monitored_task_execution_thresholds`
| Configures the threshold of failed task executions at which point the `warn` or `error` health status will be set under each task type execution status (under `stats.runtime.value.excution.result_frequency_percent_as_number[${task type}].status`). This setting allows configuration of both the default level and a custom task type specific level. By default, this setting is configured to mark the health of every task type as `warning` when it exceeds 80% failed executions, and as `error` at 90%. Custom configurations allow you to reduce this threshold to catch failures sooner for task types you might consider critical, such as alerting tasks. This value can be set to any number between 0 to 100, and a threshold is hit when the value *exceeds* this number. This means that you can avoid setting the status to `error` by setting the threshold at 100, or hit `error` the moment any task failes by setting the threshold to 0 (as it will exceed 0 once a single failer occurs).
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

|===
2 changes: 1 addition & 1 deletion docs/setup/settings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -683,5 +683,5 @@ include::{kib-repo-dir}/settings/reporting-settings.asciidoc[]
include::secure-settings.asciidoc[]
include::{kib-repo-dir}/settings/security-settings.asciidoc[]
include::{kib-repo-dir}/settings/spaces-settings.asciidoc[]
include::{kib-repo-dir}/settings/telemetry-settings.asciidoc[]
include::{kib-repo-dir}/settings/task-manager-settings.asciidoc[]
include::{kib-repo-dir}/settings/telemetry-settings.asciidoc[]
43 changes: 27 additions & 16 deletions docs/user/alerting/alerting-production-considerations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,45 @@
[[alerting-production-considerations]]
== Production considerations

{kib} alerting run both alert checks and actions as persistent background tasks managed by the Kibana Task Manager. This has two major benefits:
{kib} Alerting run both alert checks and actions as persistent background tasks managed by the {kib} Task Manager.

* *Persistence*: all task state and scheduling is stored in {es}, so if {kib} is restarted, alerts and actions will pick up where they left off. Task definitions for alerts and actions are stored in the index specified by `xpack.task_manager.index` (defaults to `.kibana_task_manager`). It is important to have at least 1 replica of this index for production deployments, since if you lose this index all scheduled alerts and actions are also lost.
* *Scaling*: multiple {kib} instances can read from and update the same task queue in {es}, allowing the alerting and action load to be distributed across instances. In cases where a {kib} instance no longer has capacity to run alert checks or actions, capacity can be increased by adding additional {kib} instances.
When relying on Alerts and Actions as mission critical services, it is important to ensure the {kib} Task Manager <<task-manager-production-considerations>> are followed.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

[float]
[[alerting-background-tasks]]
=== Running background alert checks and actions

{kib} background tasks are managed by:
{kib} uses <<task-manager>> to run Alerts and Actions as background tasks, distributed across all {kib} instances in the cluster.

* Polling an {es} task index for overdue tasks at 3 second intervals. This interval can be changed using the `xpack.task_manager.poll_interval` setting.
* Tasks are then claiming them by updating them in the {es} index, using optimistic concurrency control to prevent conflicts. Each {kib} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval.
* Tasks are run on the {kib} server.
* In the case of alerts which are recurring background checks, upon completion the task is scheduled again according to the <<defining-alerts-general-details, check interval>>.
By default, each {kib} instance polls for work at 3 second intervals, and can run a maximum of 10 concurrent tasks.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
These tasks are then ran on the {kib} server.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

In the case of alerts which are recurring background checks, upon completion the task is scheduled again according to the <<defining-alerts-general-details, check interval>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For more details on Task Manager, see <<task-manager-background-tasks>>.

[IMPORTANT]
==============================================
Because by default tasks are polled at 3 second intervals and only 10 tasks can run concurrently per {kib} instance, it is possible for alert and action tasks to be run late. This can happen if:
It is possible for alert and action tasks to be run late or at an inconsistent schedule.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
This is usually a symptom of the specific usage of the cluster in question.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

* Alerts use a small *check interval*. The lowest interval possible is 3 seconds, though intervals of 30 seconds or higher are recommended.
* Many alerts or actions must be *run at once*. In this case pending tasks will queue in {es}, and be pulled 10 at a time from the queue at 3 second intervals.
* *Long running tasks* occupy slots for an extended time, leaving fewer slots for other tasks.

For details on the settings that can influence the performance and throughput of Task Manager, see <<task-manager-settings,`Task Manager Settings`>>.
Such issues can be addressed by tweaking the {kib} <<task-manager>> or scaling the deployment to better suit your use case.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For detailed guidance, see <<alerting-troubleshooting,`Alerting Troubleshooting`>>.
==============================================

[float]
=== Deployment considerations
[[alerting-scaling-guidance]]
=== Scaling Guidance

As Alerts and Actions leverage <<task-manager>> to perform the majority of their work, scaling {kib} Alerting is possible by following the <<task-manager-scaling-guidance,Task Manager Scaling Guidance>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

When estimating the required task throughput, keep the following in mind:

* Each Alert utilizes a single recurring task which is scheduled to run at the cadance defined by its <<defining-alerts-general-details, check interval>>
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
* Each Action utilizes a single task, but since <<alerting-concepts-suppressing-duplicate-notifications, actions are taken per instance>>, alerts can end up generating a large number of non-recurring tasks.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

It is difficult to predict how much throughput is needed to ensure all Alerts and Actions are executed at consistent schedules.
By counting Alerts as recurring tasks and Actions as non-recurring tasks, a rough throughput <<task-manager-rough-throughput-estimation,can be estimated>> as a _tasks per minute_ measurment.

{es} and {kib} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, you should synchronize the clocks of all nodes in the cluster using a time service such as http://www.ntp.org/[Network Time Protocol].
Predicting the buffer required to account for Actions depends heavily on the Alert Types you use, the amount of Alert Instances they might detect and the number of actions you might choose to assign to action groups when defining your Alerts. With that in mind, we recommend regularly <<task-manager-health-monitoring,monitoring the health>> of your {kib} Task Managers instances.
56 changes: 56 additions & 0 deletions docs/user/alerting/alerting-troubleshooting.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
[role="xpack"]
[[alerting-troubleshooting]]
== Alerting Troubleshooting

Use the information in this section to troubleshoot common problems and find answers for frequently asked questions.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For issues that you cannot fix yourself … we’re here to help.
If you are an existing Elastic customer with a support contract, please create a ticket in the
https://support.elastic.co/customers/s/login/[Elastic Support portal].
Or post in the https://discuss.elastic.co/[Elastic forum].


[float]
[[alerts-small-check-interval-run-late]]
=== Alerts with Small Check Intervals Run Late
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

*Symptoms*:
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

Alerts are scheduled to run every 2 seconds but they seem to be running too late
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

*Resolution*:

As described under <<alerting-background-tasks>>, Alerts are ran as background tasks at a cadence defined by their *check interval*.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
When an Alert *check interval* is smaller than the Task Manager <<task-manager-settings,`poll_interval`>> the alert will inevitably run late.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

This issue can be addressed by either tweaking the <<task-manager-settings,{kib} Task Manager settings>> or increasing the *check interval* of the Alerts in question.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For details on resolving this issue please see <<task-manager-health-scheduled-tasks-small-schedule-interval-run-late>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved


[float]
[[scheduled-alerts-run-late]]
=== Alerts Run Late
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

*Symptoms*:
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

Scheduled Alerts run at an inconsistent cadence, often running late.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

Actions run long after the status of an Alert changes, notifying us of the change too late.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

*Resolution*:
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

As described under <<alerting-background-tasks>>, Alerts and Actions are ran as background tasks ran by each {kib} instance at a default rate of 10 tasks every 3 seconds.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

This means that if many alerts or actions have been scheduled to run at the same time, pending tasks will queue in {es}. Each {kib} instance then polls for pending tasks at a rate of up to 10 tasks at a time, at 3 second intervals. As Alerts and Actions are backed by tasks, it is possible for pending tasks in the queue to exceed this capacity and run late as a result.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For details on diagnosing the underlying causes of such delays, please see <<task-manager-health-tasks-run-late>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

Alerting and Action tasks can be identified amongst the variety of tasks in {kib} by their type.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

* Alert tasks always begin with `alerting:`. For example, the `alerting:.index-threshold` tasks back the <<alert-type-index-threshold, Index Threshold Stack Alert>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
* Action tasks always begin with `actions:`. For example, the `actions:.index` tasks back the <<index-action-type>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

When diagnosing Alerting related issue, this subset of the tasks in the system may be of particular interest.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved

For broader details on monitoring and diagnosing task execution in Task Manager, see <<task-manager-health-monitoring>>.
gmmorris marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions docs/user/alerting/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ include::action-types.asciidoc[]
include::alert-types.asciidoc[]
include::geo-alert-types.asciidoc[]
include::alerting-production-considerations.asciidoc[]
include::alerting-troubleshooting.asciidoc[]
2 changes: 2 additions & 0 deletions docs/user/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ include::reporting/index.asciidoc[]

include::alerting/index.asciidoc[]

include::task-manager/index.asciidoc[]

include::api.asciidoc[]

include::plugins.asciidoc[]
4 changes: 4 additions & 0 deletions docs/user/task-manager/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include::task-manager.asciidoc[]
include::task-manager-production-considerations.asciidoc[]
include::task-manager-health-monitoring.asciidoc[]
include::task-manager-troubleshooting.asciidoc[]
Loading