Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Add links to downsampling tutorials #45

Merged
merged 2 commits into from
Feb 27, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/performance/selects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,63 @@ to the original result.
based sampling.


.. _downsampling-timestamp-binning:

Downsampling with ``DATE_BIN``
==============================

For improved downsampling using time-bucketing and resampling, the article
`resampling time-series data with DATE_BIN`_ shares patterns how to
group records into time buckets and resample the values.

This technique will improve query performance by reducing the amount of data
needed to be transferred, by decreasing its granularity on the time dimension.
Most often, this is applied when querying live system metrics data using
visualization or dashboarding tools like Grafana and friends.

.. code-block:: sql

SELECT ts_bin,
battery_level,
battery_status,
battery_temperature
FROM (
SELECT DATE_BIN('5 minutes'::INTERVAL, "time", 0) AS ts_bin,
battery_level,
battery_status,
battery_temperature,
ROW_NUMBER() OVER (PARTITION BY DATE_BIN('5 minutes'::INTERVAL, "time", 0) ORDER BY "time" DESC) AS "row_number"
FROM doc.sensor_readings
) x
WHERE "row_number" = 1
ORDER BY 1 ASC


.. _downsampling-lttb:

Downsampling with LTTB
======================

`Largest Triangle Three Buckets`_ is a downsampling method that tries to retain
visual similarity between the downsampled data and the original dataset using
considerably fewer data points.

The article about `advanced downsampling with the LTTB algorithm`_ explains how
to use LTTB with CrateDB. This technique is mostly used for the same purposes
like other downsampling procedures, where, in this case, retaining essential
details is important for proper visual graph analysis.

.. code-block:: sql

WITH downsampleddata AS
(SELECT lttb_with_parallel_arrays(
array(SELECT n FROM demo ORDER BY n),
array(SELECT reading FROM demo ORDER BY n), 100) AS lttb)
SELECT unnest(lttb['0']) AS n,
unnest(lttb['1']) AS reading
FROM downsampleddata;


.. _rewrite-join-as-cte:

Rewrite JOINs as CTEs
Expand Down Expand Up @@ -195,8 +252,11 @@ individual records in different tables, with the same PK definition,
and the same PK values, will also have identical ``_id`` values.


.. _advanced downsampling with the LTTB algorithm: https://community.cratedb.com/t/advanced-downsampling-with-the-lttb-algorithm/1287
.. _down-sampling: https://grisha.org/blog/2015/03/28/on-time-series/#downsampling
.. _Largest Triangle Three Buckets: https://github.com/sveinn-steinarsson/flot-downsample
.. _Lucene segment: https://stackoverflow.com/a/2705123
.. _normal distribution: https://en.wikipedia.org/wiki/Normal_distribution
.. _resampling time-series data with DATE_BIN: https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009
.. _retrieving records in bulk with a list of primary key values: https://community.cratedb.com/t/retrieving-records-in-bulk-with-a-list-of-primary-key-values/1721
.. _using common table expressions to speed up queries: https://community.cratedb.com/t/using-common-table-expressions-to-speed-up-queries/1719