Skip to content

Commit

Permalink
doc/config-monitoring-stats.rst: update stats config with yaml
Browse files Browse the repository at this point in the history
  • Loading branch information
alesmrazek committed Jan 8, 2024
1 parent 5e22c1e commit ab7bca3
Showing 1 changed file with 89 additions and 194 deletions.
283 changes: 89 additions & 194 deletions doc/config-monitoring-stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,209 +5,104 @@
Statistics collector
====================

Module ``stats`` gathers various counters from the query resolution
This module gathers various counters from the query resolution
and server internals, and offers them as a key-value storage.
These metrics can be either exported to :ref:`mod-graphite`,
exposed as :ref:`mod-http-prometheus`, or processed using user-provided script
as described in chapter :ref:`async-events`.

.. note:: Please remember that each Knot Resolver instance keeps its own
statistics, and instances can be started and stopped dynamically. This might
affect your data postprocessing procedures if you are using
:ref:`systemd-multiple-instances`.

.. _mod-stats-list:

Built-in statistics
-------------------

Built-in counters keep track of number of queries and answers matching specific criteria.

+-----------------------------------------------------------------+
| **Global request counters** |
+------------------+----------------------------------------------+
| request.total | total number of DNS requests |
| | (including internal client requests) |
+------------------+----------------------------------------------+
| request.internal | internal requests generated by Knot Resolver |
| | (e.g. DNSSEC trust anchor updates) |
+------------------+----------------------------------------------+
| request.udp | external requests received over plain UDP |
| | (:rfc:`1035`) |
+------------------+----------------------------------------------+
| request.tcp | external requests received over plain TCP |
| | (:rfc:`1035`) |
+------------------+----------------------------------------------+
| request.dot | external requests received over |
| | DNS-over-TLS (:rfc:`7858`) |
+------------------+----------------------------------------------+
| request.doh | external requests received over |
| | DNS-over-HTTP (:rfc:`8484`) |
+------------------+----------------------------------------------+
| request.xdp | external requests received over plain UDP |
| | via an AF_XDP socket |
+------------------+----------------------------------------------+

+----------------------------------------------------+
| **Global answer counters** |
+-----------------+----------------------------------+
| answer.total | total number of answered queries |
+-----------------+----------------------------------+
| answer.cached | queries answered from cache |
+-----------------+----------------------------------+

+-----------------+----------------------------------+
| **Answers categorized by RCODE** |
+-----------------+----------------------------------+
| answer.noerror | NOERROR answers |
+-----------------+----------------------------------+
| answer.nodata | NOERROR, but empty answers |
+-----------------+----------------------------------+
| answer.nxdomain | NXDOMAIN answers |
+-----------------+----------------------------------+
| answer.servfail | SERVFAIL answers |
+-----------------+----------------------------------+

+-----------------+----------------------------------+
| **Answer latency** |
+-----------------+----------------------------------+
| answer.1ms | completed in 1ms |
+-----------------+----------------------------------+
| answer.10ms | completed in 10ms |
+-----------------+----------------------------------+
| answer.50ms | completed in 50ms |
+-----------------+----------------------------------+
| answer.100ms | completed in 100ms |
+-----------------+----------------------------------+
| answer.250ms | completed in 250ms |
+-----------------+----------------------------------+
| answer.500ms | completed in 500ms |
+-----------------+----------------------------------+
| answer.1000ms | completed in 1000ms |
+-----------------+----------------------------------+
| answer.1500ms | completed in 1500ms |
+-----------------+----------------------------------+
| answer.slow | completed in more than 1500ms |
+-----------------+----------------------------------+
| answer.sum_ms | sum of all latencies in ms |
+-----------------+----------------------------------+

+-----------------+----------------------------------+
| **Answer flags** |
+-----------------+----------------------------------+
| answer.aa | authoritative answer |
+-----------------+----------------------------------+
| answer.tc | truncated answer |
+-----------------+----------------------------------+
| answer.ra | recursion available |
+-----------------+----------------------------------+
| answer.rd | recursion desired (in answer!) |
+-----------------+----------------------------------+
| answer.ad | authentic data (DNSSEC) |
+-----------------+----------------------------------+
| answer.cd | checking disabled (DNSSEC) |
+-----------------+----------------------------------+
| answer.do | DNSSEC answer OK |
+-----------------+----------------------------------+
| answer.edns0 | EDNS0 present |
+-----------------+----------------------------------+

+-----------------+----------------------------------+
| **Query flags** |
+-----------------+----------------------------------+
| query.edns | queries with EDNS present |
+-----------------+----------------------------------+
| query.dnssec | queries with DNSSEC DO=1 |
+-----------------+----------------------------------+

Example:

.. code-block:: none
modules.load('stats')
-- Enumerate metrics
> stats.list()
[answer.cached] => 486178
[iterator.tcp] => 490
[answer.noerror] => 507367
[answer.total] => 618631
[iterator.udp] => 102408
[query.concurrent] => 149
-- Query metrics by prefix
> stats.list('iter')
[iterator.udp] => 105104
[iterator.tcp] => 490
-- Fetch most common queries
> stats.frequent()
[1] => {
[type] => 2
[count] => 4
[name] => cz.
}
-- Fetch most common queries (sorted by frequency)
> table.sort(stats.frequent(), function (a, b) return a.count > b.count end)
-- Show recently contacted authoritative servers
> stats.upstreams()
[2a01:618:404::1] => {
[1] => 26 -- RTT
}
[128.241.220.33] => {
[1] => 31 - RTT
}
-- Set custom metrics from modules
> stats['filter.match'] = 5
> stats['filter.match']
5
Module reference
----------------

.. function:: stats.get(key)

:param string key: i.e. ``"answer.total"``
:return: ``number``

Return nominal value of given metric.

.. function:: stats.set('key val')

Set nominal value of given metric.

Example:

.. code-block:: lua
stats.set('answer.total 5')
-- or syntactic sugar
stats['answer.total'] = 5

.. code-block:: yaml
.. function:: stats.list([prefix])
monitoring:
enabled: always
:param string prefix: optional metric prefix, i.e. ``"answer"`` shows only metrics beginning with "answer"
These metrics can be either exported to :ref:`config-monitoring-graphite` or
exposed as :ref:`config-monitoring-prometheus`.

Outputs collected metrics as a JSON dictionary.
.. option:: monitoring:

.. function:: stats.upstreams()
.. option:: enabled: manager-only|lazy|always

Outputs a list of recent upstreams and their RTT. It is sorted by time and stored in a ring buffer of
a fixed size. This means it's not aggregated and readable by multiple consumers, but also that
you may lose entries if you don't read quickly enough. The default ring size is 512 entries, and may be overridden on compile time by ``-DUPSTREAMS_COUNT=X``.
:default: lazy

.. function:: stats.frequent()
Configures, whether statistics module will be loaded into resolver.

Outputs list of most frequent iterative queries as a JSON array. The queries are sampled probabilistically,
and include subrequests. The list maximum size is 5000 entries, make diffs if you want to track it over time.
:manager-only: Disables statistics collection in all `kresd` workers.
:lazy Statistics: collection is enabled at the time of request.
:always: Statistics collection is always on.

.. function:: stats.clear_frequent()
You can see all the built-in statistics in :ref:`mod-stats-list` section.

Clear the list of most frequent iterative queries.

.. include:: ../modules/graphite/README.rst
.. include:: ../modules/http/prometheus.rst
.. _config-monitoring-prometheus:

Prometheus metrics endpoint
---------------------------

The new :ref:`manager-api` exposes `/metrics` endpoint that serves agregated metrics from statistics collector in Prometheus text format.
You can use it as soon as the HTTP API is configured.

It is also possible to use the :ref:`manager-client` to obtain and save metrics.

.. code-block:: bash
$ kresctl metrics ./metrics/data.txt
.. _config-monitoring-graphite:

Graphite/InfluxDB/Metronome
---------------------------

The Graphite module sends statistics over the Graphite_ protocol to either Graphite_, Metronome_, InfluxDB_ or any compatible storage.
This allows powerful visualization over metrics collected by Knot Resolver.

.. tip:: The Graphite server is challenging to get up and running, InfluxDB_ combined with Grafana_ are much easier, and provide richer set of options and available front-ends. Metronome_ by PowerDNS alternatively provides a mini-graphite server for much simpler setups.

Example configuration:

.. code-block:: yaml
monitoring:
graphite:
host: 127.0.0.1 # graphite server address
port: 2003 # optional graphite server port (2003 is default)
interval: 5s # optional publish interval (5s is default)
.. option:: monitoring/graphite: <graphite-config>|false

:default: false

Graphite module is disabled by default.
It is automatically enabled when configured.

.. option:: host: <address or hostname>

Graphite server IP address or hostname.

.. option:: port: <port>

:default: 2003

Optional, Graphite server port.

.. option:: prefix: <string>

:default: ""

Optional prefix for all `kresd` workers.
Worker ID is automatically added for each process.

.. option:: interval: <time ms|s|m|h|d>

:default: 5s

Optional publishing interval.

.. option:: tcp: true|false

:default: false

Optional, set to true if you want TCP mode.

.. _Graphite: https://graphite.readthedocs.io/en/latest/feeding-carbon.html
.. _InfluxDB: https://influxdb.com/
.. _Metronome: https://github.com/ahuPowerDNS/metronome
.. _Grafana: http://grafana.org/

0 comments on commit ab7bca3

Please sign in to comment.