Skip to content
This repository has been archived by the owner on Dec 29, 2020. It is now read-only.

Commit

Permalink
fix(docs): combine patterns into queries doc
Browse files Browse the repository at this point in the history
  • Loading branch information
ssube committed Feb 14, 2020
1 parent 16001d8 commit efb91ef
Show file tree
Hide file tree
Showing 4 changed files with 159 additions and 91 deletions.
Empty file removed docs/alert.md
Empty file.
19 changes: 19 additions & 0 deletions docs/dashboards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Dashboards

## How To Use

### Release Annotations

### Variables

**Note:** variables cannot be used with alerts. Combining the two would require a mapping of variable
values to their appropriate thresholds and, combined with repeated panels, could cause an explosion of
alert queries on the backend.

## Best Practices

- export and commit dashboards often
- limit the number of panels per dashboard to 8-12
- each panel is a JSON request and browsers will only make 6-8 requests per hostname. Large dashboards
will spin as panels wait to load, and slow panels may block faster ones.
- with 12 panels and a 30 second query timeout, dashboards should take no longer than 60 seconds to load.
81 changes: 0 additions & 81 deletions docs/patterns.md

This file was deleted.

150 changes: 140 additions & 10 deletions docs/query.md → docs/queries.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,36 @@
# Getting Started
# Queries

## Contents

- Where did that plugin go?

## Basics
- [Queries](#queries)
- [Contents](#contents)
- [Getting Started](#getting-started)
- [Sampling a Single Series](#sampling-a-single-series)
- [Filtering By Labels](#filtering-by-labels)
- [Calculating a Delta](#calculating-a-delta)
- [Joining Multiple Timeseries](#joining-multiple-timeseries)
- [Details](#details)
- [Labels Table](#labels-table)
- [Samples Table](#samples-table)
- [Metrics View](#metrics-view)
- [Best Practices](#best-practices)
- [Continuous Aggregate Patterns](#continuous-aggregate-patterns)
- [Aggregate Intervals](#aggregate-intervals)
- [Query Patterns](#query-patterns)
- [CASE Statements](#case-statements)
- [Duplicate Samples](#duplicate-samples)
- [JSON Columns](#json-columns)
- [Quote Sensitivity](#quote-sensitivity)
- [Time Filter](#time-filter)
- [Window Functions](#window-functions)
- [Grafana Practices](#grafana-practices)
- [Grafana Errors](#grafana-errors)
- [JSON Body Marshal](#json-body-marshal)
- [Data Points Outside Time Range](#data-points-outside-time-range)
- [Grafana Macros](#grafana-macros)
- [Result Column Names](#result-column-names)

## Getting Started

### Sampling a Single Series

Expand Down Expand Up @@ -159,9 +185,89 @@ View definition:
WHERE s."time" > (now() - '06:00:00'::interval);
```

## Gotchas
## Best Practices

### Continuous Aggregate Patterns

#### Aggregate Intervals

Continuous aggregates are calculated on an interval and will have a delay (gap on the right/recent edge of the graph).
The delay will be about `1.5 * (interval + lag)`, giving time for each bucket to fill completely; expect a 15 minute
delay for `5m` interval/lag and up to 45 minutes for `15m`.

### Query Patterns

#### CASE Statements

Using `CASE` statements to join multiple timeseries tend to be extremely slow, since it has to fetch
each series individually or miss the `lid` index, then sort and correlate points between the two.

Avoid this pattern at all costs:

```sql
SELECT
AVG(bars) / MAX(bins) -- with any aggregate functions, not just AVG/MAX
FROM (
SELECT
...
MAX(CASE WHEN name = 'foo_bar' THEN value ELSE null END) AS bars,
MAX(CASE WHEN name = 'foo_bin' THEN value ELSE null END) AS bins
FROM metrics
WHERE
$__timeFilter("time")
AND name IN (
'foo_bar',
'foo_bin'
)
GROUP BY timeGroup, metric
)
```

Matching points between the disparate series causes a repeated quicksort, which can quickly exceed memory
limits and write to disk.

#### Duplicate Samples

Grouping duplicate metrics is necessary when running Prometheus in high-availability sets. Since each Prometheus
instance labels metrics with its own ID, they appear as individual timeseries with different `lid`s, but can be
grouped by time/bucket and metric.

For queries with a single value, momentary or rate:

```sql
SELECT
labels->>'some_label' AS "metric",
$__timeGroup(time, ${__interval}) AS "time",
MAX(value) AS "value"
FROM metrics
WHERE
$__timeFilter(time) AND
name = 'some_metric'
GROUP BY $__timeGroup(time, ${__interval}), metric
ORDER BY 1, 2
```

For queries with multiple values or a complex aggregate:

### Schema Gotchas
```sql
SELECT
metric,
time,
MAX(value) AS "value"
FROM (
SELECT
labels->>'some_label' AS "metric",
$__timeGroup("time", ${__interval}),
value
FROM metrics
WHERE
$__timeFilter(time) AND
name = 'some_metric'
GROUP BY $__timeGroup("time", ${__interval}), labels->>'other_label'
) AS s
GROUP BY metric, time
ORDER BY 1, 2
```

#### JSON Columns

Expand Down Expand Up @@ -202,12 +308,36 @@ Within the `WHERE` clause, keep your filters in order by volume of data:

> https://www.postgresql.org/docs/10/tutorial-window.html
Windows allow Postgres to compare the same metric, filtered by labels, across time.
Window functions allow aggregates to view more than one row at a time, often using the previous and/or next row
to calculate change over time. Larger windows may calculate ordered aggregates like percentiles, but also need to
look at a larger set of rows and may not scale well.

Some utility functions are provided in [the `rate_` family](../schema/utils/rate.sql) to calculate change between
samples, handle counter resets, and adjust for time.

Time buckets should be grouped in a sub-select before the window function:

Any filters and most renaming should be done within an inner select, with the other select used to handle the
window and counter resets.
```sql
SELECT
metric,
bucket AS time,
rate_time(value, lag(value) OVER w, '${__interval}') AS value
FROM (
SELECT
CONCAT(labels->>'instance', labels->>'job') AS metric,
$__timeGroup("time", ${__interval}) AS bucket,
MAX(value) AS value
FROM metrics
WHERE
$__timeFilter("time") AND
name = 'node_disk_write_time_seconds_total'
GROUP BY metric, bucket
) AS m
WINDOW w AS (PARTITION BY metric ORDER BY bucket)
ORDER BY metric, time;
```

### Grafana
### Grafana Practices

#### Grafana Errors

Expand Down

0 comments on commit efb91ef

Please sign in to comment.