From b3a0e1504c7e2483e813e554fec2fcd6b6efae19 Mon Sep 17 00:00:00 2001
From: Daniel Mitterdorfer <danielmitterdorfer@users.noreply.github.com>
Date: Thu, 17 Dec 2020 14:15:33 +0100
Subject: [PATCH] Add a configuration file reference (#1137)

With this commit we add reference docs for Rally's configuration file
`rally.ini`. We also move one configuration property from the `system`
to the `reporting` section as it is more appropriate there.

We intentionally placed this information on the existing configuration
page instead of creating a new one. We did this to provide continuity in
the future because we intend to remove the dedicated `configure`
subcommand and instead rely on users editing the configuration file
directly. When we remove this functionality, we can also remove obsolete
sections from this page and move it to the reference documentation.

Closes #991
---
 docs/configuration.rst   | 125 +++++++++++++++++++++++++++++++++++++++
 esrally/driver/driver.py |   2 +-
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/docs/configuration.rst b/docs/configuration.rst
index cfb20ec1d..c8c3fc865 100644
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -79,6 +79,131 @@ Rally will ask you a few more things in the advanced setup:
 * **Name for this benchmark environment** (only for metrics store type ``elasticsearch``): You can use the same metrics store for multiple environments (e.g. local, continuous integration etc.) so you can separate metrics from different environments by choosing a different name.
 * whether or not Rally should keep the Elasticsearch benchmark candidate installation including all data by default. This will use lots of disk space so you should wipe ``~/.rally/benchmarks/races`` regularly.
 
+Configuration File Reference
+----------------------------
+
+Rally stores its configuration in the file ``~/.rally/rally.ini``. It comprises the following sections.
+
+meta
+~~~~
+
+This section contains meta information about the configuration file.
+
+* ``config.version``: The version of the configuration file format. This property is managed by Rally and should not be changed.
+
+system
+~~~~~~
+
+This section contains global information for the current benchmark environment. This information should be identical on all machines where Rally is installed.
+
+* ``env.name`` (default: "local"): The name of this benchmark environment. It is used as meta-data in metrics documents if an Elasticsearch metrics store is configured. Only alphanumeric characters are allowed.
+* ``probing.url`` (default: "https://github.com"): This URL is used by Rally to check for a working Internet connection. It's useful to change this to an internal server if all data are hosted inside the corporate network and connections to the outside world are prohibited.
+* ``available.cores`` (default: number of logical CPU cores): Determines the number of available CPU cores. Rally aims to create one asyncio event loop per core and will distribute clients evenly across event loops.
+* ``async.debug`` (default: false): Enables debug mode on Rally's internal `asyncio event loop <https://docs.python.org/3/library/asyncio-eventloop.html#enabling-debug-mode>`_. This setting is mainly intended for troubleshooting.
+* ``passenv`` (default: "PATH"): A comma-separated list of environment variable names that should be passed to the Elasticsearch process.
+
+node
+~~~~
+
+This section contains machine-specific information.
+
+* ``root.dir`` (default: "~/.rally/benchmarks"): Rally uses this directory to store all benchmark-related data. It assumes that it has complete control over this directory and any of its subdirectories.
+* ``src.root.dir`` (default: "~/.rally/benchmarks/src"): The directory where the source code of Elasticsearch or any plugins is checked out. Only relevant for benchmarks from sources.
+
+source
+~~~~~~
+
+This section contains more details about the source tree.
+
+* ``remote.repo.url`` (default: "https://github.com/elastic/elasticsearch.git"): The URL from which to checkout Elasticsearch.
+* ``elasticsearch.src.subdir`` (default: "elasticsearch"): The local path, relative to ``src.root.dir``, of the Elasticsearch source tree.
+* ``cache`` (default: true): Enables Rally's internal :ref:`source artifact <pipelines_from-sources>` cache (``elasticsearch*.tar.gz`` and optionally ``*.zip`` files for plugins). Artifacts are cached based on their git revision.
+* ``cache.days`` (default: 7): The number of days for which an artifact should be kept in the source artifact cache.
+
+benchmarks
+~~~~~~~~~~
+
+This section contains details about the benchmark data directory.
+
+* ``local.dataset.cache`` (default: "~/.rally/benchmarks/data"): The directory in which benchmark data sets are stored. Depending on the benchmarks that are executed, this directory may contain hundreds of GB of data.
+
+reporting
+~~~~~~~~~
+
+This section defines how metrics are stored.
+
+* ``datastore.type`` (default: "in-memory"): If set to "in-memory" all metrics will be kept in memory while running the benchmark. If set to "elasticsearch" all metrics will instead be written to a persistent metrics store and the data are available for further analysis.
+* ``sample.queue.size`` (default: 2^20): The number of metrics samples that can be stored in Rally's in-memory queue.
+* ``"metrics.request.downsample.factor`` (default: 1): Determines how many service time and latency samples should be kept in the metrics store. By default all values will be kept. To keep only e.g. every 100th sample, specify 100. This is useful to avoid overwhelming the metrics store in benchmarks with many clients (tens of thousands).
+* ``output.processingtime`` (default: false): If set to "true", Rally will show a metric, called "processing time" in the command line report. Contrary to "service time" which is measured as close as possible to the wire, "processing time" also includes Rally's client side processing overhead. Large differences between the service time and the reporting time indicate a high overhead in the client and can thus point to a potential client-side bottleneck which requires investigation.
+
+The following settings are applicable only if ``datastore.type`` is set to "elasticsearch":
+
+* ``datastore.host``: The host name of the metrics store, e.g. "10.17.20.33".
+* ``datastore.port``: The port of the metrics store, e.g. "9200".
+* ``datastore.secure``: If set to ``false``, Rally assumes a HTTP connection. If set to ``true``, it assumes a HTTPS connection.
+* ``datastore.ssl.verification_mode`` (default: "full"): By default the metric store's SSL certificate is checked ("full"). To disable certificate verification set this value to "none".
+* ``datastore.ssl.certificate_authorities`` (default: empty): Determines the path on the local file system to the certificate authority's signing certificate.
+* ``datastore.user``: Sets the name of the Elasticsearch user for the metrics store.
+* ``datastore.password``: Sets the password of the Elasticsearch user for the metrics store.
+* ``datastore.probe.cluster_version`` (default: true): Enables automatic detection of the metric store's version.
+
+**Examples**
+
+Define an unprotected metrics store in the local network::
+
+    [reporting]
+    datastore.type = elasticsearch
+    datastore.host = 192.168.10.17
+    datastore.port = 9200
+    datastore.secure = false
+    datastore.user =
+    datastore.password =
+
+Define a secure connection to a metrics store in the local network with a self-signed certificate::
+
+    [reporting]
+    datastore.type = elasticsearch
+    datastore.host = 192.168.10.22
+    datastore.port = 9200
+    datastore.secure = true
+    datastore.ssl.verification_mode = none
+    datastore.user = rally
+    datastore.password = the-password-to-your-cluster
+
+Define a secure connection to an Elastic Cloud cluster::
+
+    [reporting]
+    datastore.type = elasticsearch
+    datastore.host = 123456789abcdef123456789abcdef1.europe-west4.gcp.elastic-cloud.com
+    datastore.port = 9243
+    datastore.secure = true
+    datastore.user = rally
+    datastore.password = the-password-to-your-cluster
+
+
+tracks
+~~~~~~
+
+This section defines how :doc:`tracks </track>` are retrieved. All keys are read by Rally using the convention ``<<track-repository-name>>.url``, e.g. ``custom-track-repo.url`` which can be selected the command-line via ``--track-repository="custom-track-repo"``. By default, Rally chooses the track repository specified via ``default.url`` which points to https://github.com/elastic/rally-tracks.
+
+teams
+~~~~~
+
+This section defines how :doc:`teams </car>` are retrieved. All keys are read by Rally using the convention ``<<team-repository-name>>.url``, e.g. ``custom-team-repo.url`` which can be selected the command-line via ``--team-repository="custom-team-repo"``. By default, Rally chooses the track repository specified via ``default.url`` which points to https://github.com/elastic/rally-teams.
+
+defaults
+~~~~~~~~
+
+This section defines default values for certain command line parameters of Rally.
+
+* ``preserve_benchmark_candidate`` (default: false): Determines whether Elasticsearch installations will be preserved or wiped by default after a benchmark. For preserving an installation for a single benchmark, use the command line flag ``--preserve-install``.
+
+distributions
+~~~~~~~~~~~~~
+
+* ``release.cache`` (default: true): Determines whether released Elasticsearch versions should be cached locally.
+
 Proxy Configuration
 -------------------
 
diff --git a/esrally/driver/driver.py b/esrally/driver/driver.py
index 69e1f0bb9..c87e3a6e2 100644
--- a/esrally/driver/driver.py
+++ b/esrally/driver/driver.py
@@ -819,7 +819,7 @@ def receiveMsg_StartWorker(self, msg, sender):
         self.worker_id = msg.worker_id
         self.config = load_local_config(msg.config)
         self.on_error = self.config.opts("driver", "on.error")
-        self.sample_queue_size = int(self.config.opts("system", "sample.queue.size", mandatory=False, default_value=1 << 20))
+        self.sample_queue_size = int(self.config.opts("reporting", "sample.queue.size", mandatory=False, default_value=1 << 20))
         self.track = msg.track
         track.set_absolute_data_path(self.config, self.track)
         self.client_allocations = msg.client_allocations