Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for rule-based anomaly detection and imputation #8202

Merged
merged 39 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
942c7d7
Add documentation for rule-based anomaly detection and imputation
kaituo Sep 9, 2024
b2af679
Doc review
vagimeli Sep 10, 2024
e2c656e
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
fe79e71
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
dcbce5a
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
23bcea3
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
2754b3b
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
1a5120d
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
6c3326d
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
614b660
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
3ab815f
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
596adfa
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
bee1f4c
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
f8ee3d9
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
28a6b77
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
c318ece
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
4189083
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
199fbc3
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
595b45a
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
66c48c4
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
d6913fb
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
45443b9
Update _observing-your-data/ad/index.md
vagimeli Sep 11, 2024
cfc3709
Update _observing-your-data/ad/result-mapping.md
vagimeli Sep 11, 2024
8a3b25d
Update _observing-your-data/ad/index.md
vagimeli Sep 12, 2024
bc9488a
Update _observing-your-data/ad/index.md
vagimeli Sep 12, 2024
50eff8b
Update _observing-your-data/ad/index.md
vagimeli Sep 12, 2024
2c2e06c
Update _observing-your-data/ad/index.md
vagimeli Sep 12, 2024
894efee
Update index.md
vagimeli Sep 13, 2024
5738739
Update result-mapping.md
vagimeli Sep 13, 2024
14dc454
Update _observing-your-data/ad/index.md
vagimeli Sep 13, 2024
4ad9e02
Update _observing-your-data/ad/index.md
vagimeli Sep 13, 2024
4d7f738
Merge branch 'main' into 2.17
vagimeli Sep 13, 2024
9afca30
Fix links
vagimeli Sep 13, 2024
0067b5d
Fix links
vagimeli Sep 13, 2024
a99969b
Address editorial feedback
vagimeli Sep 13, 2024
7ea3d63
Address editorial feedback
vagimeli Sep 13, 2024
4b42bc2
Merge branch 'main' into 2.17
vagimeli Sep 13, 2024
f9434ec
Merge branch 'main' into 2.17
vagimeli Sep 13, 2024
ca49c0c
Update _observing-your-data/ad/index.md
vagimeli Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 74 additions & 23 deletions _observing-your-data/ad/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,36 @@

# Anomaly detection

An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure.
An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure.

It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).

You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.

To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
## Using OpenSearch Dashboards anomaly detection

To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. OpenSearch Dashboards contains sample datasets. You can use these datasets with their preconfigured detectors to try out the feature.

The following tutorial guides you through using anomaly detection with your OpenSearch data.

## Step 1: Define a detector

A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
A _detector_ is an individual anomaly detection task. You can define multiple detectors. All the detectors can run simultaneously, with each analyzing data from different sources.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

1. Choose **Create detector**.
1. Add in the detector details.
- Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. Specify the data source.
1. Add the detector details.
- Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the detector's purpose.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. Specify the data source.
- For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL).

#### Example filter using query DSL
The query is designed to retrieve documents in which the `urlPath.keyword` field matches one of the following specified values:
---

#### Example: Filter using query DSL

The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values:
The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values:


- /domain/{id}/short
- /sub_dir/{id}/short
Expand Down Expand Up @@ -62,9 +68,12 @@
}
}
```
{% include copy-curl.html %}

---

1. Specify a timestamp.
- Select the **Timestamp field** in your index.
1. Specify a timestamp.
- Select the **Timestamp field** in the index.
1. Define operation settings.
- For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
Expand All @@ -76,6 +85,8 @@
- (Optional) To add extra processing time for data collection, specify a **Window delay** value.
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time.
- To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.
- To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures that the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, too long of a window delay can hinder real-time anomaly detection because the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection.


1. Specify custom results index.
- The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored.

Expand Down Expand Up @@ -109,31 +120,32 @@

## Step 2: Configure the model

#### Add features to your detector
1. Add features to your detector.

A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
A _feature_ is any field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.


For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.

A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
{: .note }
{: .note}

To configure an anomaly detection model based on an aggregation method, follow these steps:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
1. On the **Configure model** page, enter the **Feature name** and select the **Enable feature** checkbox.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. For **Find anomalies based on**, select **Field Value**.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. For **Field**, select from the available options.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

To configure an anomaly detection model based on a JSON aggregation query, follow these steps:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
1. For **Find anomalies based on**, select **Custom expression**. You will see the JSON editor window open up.

1. On the **Configure Model** page, enter the **Feature name** and select the **Enable feature** checkbox.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
1. Enter your JSON aggregation query in the editor.

For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/)
For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/).

{: .note }
{: .note}

#### (Optional) Set category fields for high cardinality
### (Optional) Set category fields for high cardinality

You can categorize anomalies based on a keyword or IP field type.

Expand All @@ -160,13 +172,52 @@
This formula serves as a starting point. Make sure to test it with a representative workload. You can find more information in the [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) blog post.
{: .note }

#### (Advanced settings) Set a shingle size
### (Advanced settings) Set a shingle size

Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case.

The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.
The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is `8`. Choose `1` only if you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also increase false positives. Larger values might be useful for ignoring noise in a signal.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
### (Advanced settings) Set an imputation option
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps:

- **Ignore Missing Data (Default):** The system continues without considering missing data points, keeping the existing data flow.
- **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data.
- **Fill with Zeros:** Replace missing values with zeros. This is ideal when the absence of data indicates a significant event, such as a drop to zero in event counts.
- **Use Previous Values:** Fill gaps with the last observed value to maintain continuity in your time-series data. This method treats missing data as non-anomalous, carrying forward the previous trend.

Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall.

Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping) for more information.

Check failure on line 192 in _observing-your-data/ad/index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping)'. Raw Output: {"message": "[OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping)'.", "location": {"path": "_observing-your-data/ad/index.md", "range": {"start": {"line": 192, "column": 327}}}, "severity": "ERROR"}
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{: note}

### (Advanced settings) Suppressing anomalies with threshold-based rules

You can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations.

Suppose you want to detect substantial changes in log volume while ignoring small variations that are not meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can ignore minor deviations and focus on real anomalous patterns.

To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules:
To suppress anomalies for deviations of less than 30% from the expected value, you can set the following rules:


```
Ignore anomalies for feature logVolume when the actual value is no more than 30% above the expected value.
Ignore anomalies for feature logVolume when the actual value is no more than 30% below the expected value.
```

Ensure that a feature, for example, `logVolume`, is properly defined in your model. Suppression rules are tied to specific features.
{: .note}

If you expect that the log volume should differ by at least 10,000 from the expected value before being considered an anomaly, you can set absolute thresholds:

```
Ignore anomalies for feature logVolume when the actual value is no more than 10000 above the expected value.
Ignore anomalies for feature logVolume when the actual value is no more than 10000 below the expected value.
```

If no custom suppression rules are set, then the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature.

#### Preview sample anomalies
### Preview sample anomalies

Preview sample anomalies and adjust the feature settings if needed.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
For sample previews, the Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.
Expand Down
79 changes: 77 additions & 2 deletions _observing-your-data/ad/result-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ redirect_from:

# Anomaly result mapping

If you enabled custom result index, the anomaly detection plugin stores the results in your own index.
If you enabled custom result index, the Anomaly Detection plugin stores the results in your own index.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

If the anomaly detector doesn’t detect an anomaly, the result has the following format:
If the anomaly detector does not detect an anomaly, the result has the following format:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above: "results index" (both instances)?

Copy link
Collaborator

@vagimeli vagimeli Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper name is "result index"
Screenshot 2024-09-13 at 10 55 51 AM

```json
{
Expand Down Expand Up @@ -80,6 +80,81 @@ Field | Description
`model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity.
`threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold.

When the imputation option is enabled, the anomaly result output includes a `feature_imputed` array, showing which features have been imputed. This information helps you identify which features were modified during the anomaly detection process due to missing data. If no features were imputed, then the `feature_imputed` array is excluded from the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above: "result includes" => "results include"?

In this example, the feature `processing_bytes_max` was imputed, as indicated by the `imputed: true` status:

```json
{
"detector_id": "kzcZ43wBgEQAbjDnhzGF",
"schema_version": 5,
"data_start_time": 1635898161367,
"data_end_time": 1635898221367,
"feature_data": [
{
"feature_id": "processing_bytes_max",
"feature_name": "processing bytes max",
"data": 2322
},
{
"feature_id": "processing_bytes_avg",
"feature_name": "processing bytes avg",
"data": 1718.6666666666667
},
{
"feature_id": "processing_bytes_min",
"feature_name": "processing bytes min",
"data": 1375
},
{
"feature_id": "processing_bytes_sum",
"feature_name": "processing bytes sum",
"data": 5156
},
{
"feature_id": "processing_time_max",
"feature_name": "processing time max",
"data": 31198
}
],
"execution_start_time": 1635898231577,
"execution_end_time": 1635898231622,
"anomaly_score": 1.8124904404395776,
"anomaly_grade": 0,
"confidence": 0.9802940756605277,
"entity": [
{
"name": "process_name",
"value": "process_3"
}
],
"model_id": "kzcZ43wBgEQAbjDnhzGF_entity_process_3",
"threshold": 1.2368549346675202,
"feature_imputed": [
{
"feature_id": "processing_bytes_max",
"imputed": true
},
{
"feature_id": "processing_bytes_avg",
"imputed": false
},
{
"feature_id": "processing_bytes_min",
"imputed": false
},
{
"feature_id": "processing_bytes_sum",
"imputed": false
},
{
"feature_id": "processing_time_max",
"imputed": false
}
]
}
```

If an anomaly detector detects an anomaly, the result has the following format:

```json
Expand Down
Loading