Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for OpenSearch backend #348

Merged
merged 7 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ RUN apt-get update && \
ADD . /app
WORKDIR /app
RUN pip install -U setuptools
RUN pip install ."[api, datadog, dynatrace, prometheus, elasticsearch, splunk, pubsub, cloud_monitoring, cloud_service_monitoring, cloud_storage, bigquery, cloudevent, dev]"
RUN pip install ."[api, datadog, dynatrace, prometheus, elasticsearch, opensearch, splunk, pubsub, cloud_monitoring, cloud_service_monitoring, cloud_storage, bigquery, cloudevent, dev]"
ENTRYPOINT [ "slo-generator" ]
CMD ["-v"]
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ develop: install
pre-commit install

install: clean
$(PIP) install -e ."[api, datadog, prometheus, elasticsearch, splunk, pubsub, cloud_monitoring, bigquery, dev]"
$(PIP) install -e ."[api, datadog, prometheus, elasticsearch, opensearch, splunk, pubsub, cloud_monitoring, bigquery, dev]"

uninstall: clean
$(PIP) freeze --exclude-editable | xargs $(PIP) uninstall -y
Expand Down Expand Up @@ -102,7 +102,7 @@ bandit:
safety:
safety check

integration: int_cm int_csm int_custom int_dd int_dt int_es int_prom int_sp
integration: int_cm int_csm int_custom int_dd int_dt int_es int_prom int_sp int_os

int_cm:
slo-generator compute -f samples/cloud_monitoring -c samples/config.yaml
Expand All @@ -128,6 +128,9 @@ int_prom:
int_sp:
slo-generator compute -f samples/splunk -c samples/config.yaml

int_os:
slo-generator compute -f samples/opensearch -c samples/config.yaml

# Run API locally
run_api:
slo-generator api --target=run_compute --signature-type=http -c samples/config.yaml
Expand Down
97 changes: 97 additions & 0 deletions docs/providers/opensearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Elasticsearch
maximepln marked this conversation as resolved.
Show resolved Hide resolved

## Backend

Using the `opensearch` backend class, you can query any metrics available in Opensearch to create an SLO.
maximepln marked this conversation as resolved.
Show resolved Hide resolved

```yaml
backends:
opensearch:
url: ${OPENSEARCH_URL}
```

Note that `url` can be either a single string (when connecting to a single node) or a list of strings (when connecting to multiple nodes):

```yaml
backends:
opensearch:
url: https://localhost:9200
```

```yaml
backends:
opensearch:
url:
- https://localhost:9200
- https://localhost:9201
```

The following method is available to compute SLOs with the `opensearch` backend:

* `good_bad_ratio` method is used to compute the ratio between two metrics:

* **Good events**, i.e events we consider as 'good' from the user perspective.
* **Bad or valid events**, i.e events we consider either as 'bad' from the user perspective, or all events we consider as 'valid' for the computation of the SLO.

This method is often used for availability SLOs, but can be used for other purposes as well (see examples).

**SLO example:**

```yaml
backend: opensearch
method: good_bad_ratio
service_level_indicator:
index: my-index
date_field: '@timestamp'
query_good:
must:
range:
api-response-time:
lt: 350
query_bad:
must:
range:
api-response-time:
gte: 350
```

Additional info:

* `date_field`: Has to be a valid Opensearch `timestamp` type

**→ [Full SLO config](../../samples/opensearch/slo_opensearch_latency_sli.yaml)**

You can also use the `filter_bad` field which identifies bad events instead of the `filter_valid` field which identifies all valid events.

The Lucene query entered in either the `query_good`, `query_bad` or `query_valid` fields will be combined (using the `bool` operator) into a larger query that filters results on the `window` specified in your Error Budget Policy steps.

The full `Opensearh` query body for the `query_bad` above will therefore look like:

```json
{
"query": {
"bool": {
"must": {
"range": {
"api-response-time": {
"gte": 350
}
}
},
"filter": {
"range": {
"@timestamp": {
"gte": "now-3600s/s",
"lt": "now/s"
}
}
}
}
},
"track_total_hits": true
}
```

### Examples

Complete SLO samples using the `opensearch` backend are available in [samples/elasticsearch](../../samples/opensearch). Check them out!
1 change: 1 addition & 0 deletions samples/.env.sample
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ export DYNATRACE_API_TOKEN=
export BIGQUERY_PROJECT_ID=
export BIGQUERY_DATASET_ID=
export BIGQUERY_TABLE_ID=
export OPENSEARCH_URL=
2 changes: 2 additions & 0 deletions samples/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ backends:
port: ${SPLUNK_PORT}
user: ${SPLUNK_USER}
password: ${SPLUNK_PWD}
opensearch:
url: ${OPENSEARCH_URL}

exporters:
cloudevent:
Expand Down
25 changes: 25 additions & 0 deletions samples/opensearch/slo_opensearch_availability_sli.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: sre.google.com/v2
kind: ServiceLevelObjective
metadata:
name: open-search-availability
labels:
service_name: opensearch
feature_name: opensearch-availability
slo_name: availability
spec:
description: 99% of the element are valid
backend: opensearch
method: good_bad_ratio
exporters: []
service_level_indicator:
index: my-index
date_field: '@timestamp'
query_good:
must:
term:
status: 200
query_bad:
must_not:
term:
status: 200
goal: 0.99
27 changes: 27 additions & 0 deletions samples/opensearch/slo_opensearch_latency_sli.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: sre.google.com/v2
kind: ServiceLevelObjective
metadata:
name: open-search-latency
labels:
service_name: opensearch
feature_name: opensearch-latency
slo_name: latency
spec:
description: 99% of the element are valid
backend: opensearch
method: good_bad_ratio
exporters: []
service_level_indicator:
index: my-index
date_field: '@timestamp'
query_good:
must:
range:
api-response-time:
lt: 350
query_bad:
must:
range:
api-response-time:
gte: 350
goal: 0.99
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ cloud_storage =
google-cloud-storage
elasticsearch =
elasticsearch
opensearch =
opensearch-py
splunk =
splunk-sdk
pubsub =
Expand Down
144 changes: 144 additions & 0 deletions slo_generator/backends/opensearch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
"""
`opensearch.py`
Opensearch backend implementation.
"""

import copy
import logging

from opensearchpy import OpenSearch

from slo_generator.constants import NO_DATA

LOGGER = logging.getLogger(__name__)


# pylint: disable=duplicate-code
class OpensearchBackend:
maximepln marked this conversation as resolved.
Show resolved Hide resolved
"""Backend for querying metrics from OpenSearch.

Args:
client(opensearch.OpenSearch): Existing OS client.
os_config(dict): OS client configuration.
"""

def __init__(self, client=None, **os_config):
self.client = client
if self.client is None:
conf = copy.deepcopy(os_config)
url = conf.pop("url", None)
basic_auth = conf.pop("basic_auth", None)
api_key = conf.pop("api_key", None)
if url:
conf["hosts"] = url
if basic_auth:
conf["basic_auth"] = (basic_auth["username"], basic_auth["password"])
if api_key:
conf["api_key"] = (api_key["id"], api_key["value"])

self.client = OpenSearch(**conf)

# pylint: disable=unused-argument
def good_bad_ratio(self, timestamp, window, slo_config):
"""Query two timeseries, one containing 'good' events, one containing
'bad' events.

Args:
timestamp(int): UNIX timestamp.
window(int): Window size (in seconds).
slo_config(dict): SLO configuration.
spec:
method: "good_bad_ratio"
service_level_indicator:
query_good(str): the search query to look for good events
query_bad(str): the search query to look for ba events
query_valid(str): the search query to look for valid events

Returns:
tuple: good_event_count, bad_event_count
"""
measurement = slo_config["spec"]["service_level_indicator"]
index = measurement["index"]
query_good = measurement["query_good"]
query_bad = measurement.get("query_bad")
query_valid = measurement.get("query_valid")
date_field = measurement.get("date_field")

good = OS.build_query(query_good, window, date_field)
bad = OS.build_query(query_bad, window, date_field)
valid = OS.build_query(query_valid, window, date_field)

good_events_count = OS.count(self.query(index, good))

if query_bad is not None:
bad_events_count = OS.count(self.query(index, bad))
elif query_valid is not None:
bad_events_count = OS.count(self.query(index, valid)) - good_events_count
else:
raise ValueError("`filter_bad` or `filter_valid` is required.")

return good_events_count, bad_events_count

def query(self, index, body):
"""Query Opensearch server.

Args:
index(str): Index to query.
body(dict): Query body.

Returns:
dict: Response.
"""
return self.client.search(index=index, body=body)

@staticmethod
def count(response):
"""Count event in opensearch response.

Args:
response(dict): Opensearch query response.

Returns:
int: Event count.
"""
try:
return response["hits"]["total"]["value"]
except KeyError as exception:
LOGGER.warning("Couldn't find any values in timeseries response")
LOGGER.debug(exception, exc_info=True)
return NO_DATA

@staticmethod
def build_query(query, window, date_field):
"""Build Opensearch query.

Add window to existing query.
Replace window for different error budget steps on-the-fly.

Args:
query(dict): Existing query body.
window(int): Window in seconds.
date_field(str): Field to filter time on

Returns:
dict: Query body with range clause added.
"""
if query is None:
return None
body = {"query": {"bool": query}, "track_total_hits": True}
range_query = {
f"{date_field}": {
"gte": f"now-{window}s/s",
"lt": "now/s",
}
}

if "filter" in body["query"]["bool"]:
body["query"]["bool"]["filter"]["range"] = range_query
else:
body["query"]["bool"]["filter"] = {"range": range_query}

return body


OS = OpensearchBackend
Loading