Introduce challenges and cars as replacement for track setup

Closes #101
elastic · May 18, 2016 · 8cf974e · 8cf974e
1 parent 232ddcd
commit 8cf974e
Show file tree

Hide file tree

Showing 28 changed files with 412 additions and 445 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,18 @@
+### 0.3.0
+
+#### Breaking changes
+
+We have [separated the previously known "track setup" into two parts](https://github.com/elastic/rally/issues/101):
+
+* Challenges: Which describe what happens during a benchmark (whether to index or search and with which parameters)
+* Cars: Which describe the benchmark candidate settings (e.g. heap size, logging configuration etc.)
+
+This influences the command line interface in a couple of ways:
+
+* To list all known cars, we have added a new command `esrally list cars`. To select a challenge, use now `--challenge` instead of `--track-setup` and also specify a car now with `--car`.
+* Tournaments created by older versions of Rally are incompatible
+* Rally must now be invoked with only one challenge and only one car (previously it was possible to specify multiple track setups)
+
 ### 0.2.1
 
 * Add a [tournament mode](https://github.com/elastic/rally/issues/57). More information in the [user docs](https://esrally.readthedocs.io/en/latest/tournament.html)

diff --git a/docs/adding_benchmarks.rst b/docs/adding_benchmarks.rst
@@ -4,9 +4,7 @@ Adding new benchmarks to Rally
 Overview
 --------
 
-Although it is possible to add new benchmarks to Rally, it is needed to :doc:`set up Rally in development mode first </developing>`. We will
-eventually `split benchmark specifications from Rally <https://github.com/elastic/rally/issues/26>`_ but the API is currently not stable
-enough to support this reliably.
+Although it is possible to add new benchmarks to Rally, it is needed to :doc:`set up Rally in development mode first </developing>`. We will eventually `split benchmark specifications from Rally <https://github.com/elastic/rally/issues/26>`_ but the API is currently not stable enough to support this reliably.
 
 First of all we need to clarify what a benchmark is. Rally has a few assumptions built-in:
 
@@ -19,25 +17,19 @@ A benchmark is called a "track" in Rally. The most important attributes of a tra
 * One or more indices, each with one or more types
 * The queries to issue
 * Source URL of the benchmark data
-* A list of track setups
+* A list of steps to run, which we'll call "challenge", for example indexing data with a specific number of documents per bulk request or running searches for a defined number of iterations.
 
-A "track setup" defines custom settings of the benchmark candidate (Elasticsearch) for this track, like how much heap memory to use, the
-number of nodes to start and so on. Rally comes with a set of default track setups which you can use for your own benchmarks (but you don't
-have to).
+Separately from a track, we also have "cars" which define the settings of the benchmark candidate (Elasticsearch), like how much heap memory to use, the number of nodes to start and so on. Rally comes with a set of default tracks and cars which you can use for your own benchmarks (but you don't have to).
 
-Example benchmark
------------------
+Example track
+-------------
 
-Let's create an example benchmark step by step. First of all, we need some data. There are a lot of public data sets available which are
-interesting for new benchmarks and we also have a 
+Let's create an example track step by step. First of all, we need some data. There are a lot of public data sets available which are interesting for new benchmarks and we also have a
 `backlog of benchmarks to add <https://github.com/elastic/rally/issues?q=is%3Aissue+is%3Aopen+label%3A%3ABenchmark>`_.
 
-`Geonames <http://www.geonames.org/>`_ provides geo data under a `creative commons license <http://creativecommons.org/licenses/by/3.0/>`_. We
-will download `allCountries.zip <http://download.geonames.org/export/dump/allCountries.zip>`_ (around 300MB), extract it and
-inspect ``allCountries.txt``.
+`Geonames <http://www.geonames.org/>`_ provides geo data under a `creative commons license <http://creativecommons.org/licenses/by/3.0/>`_. We will download `allCountries.zip <http://download.geonames.org/export/dump/allCountries.zip>`_ (around 300MB), extract it and inspect ``allCountries.txt``.
 
-You will note that the file is tab-delimited but we need JSON to bulk-index data with Elasticsearch. So we can use a small script to do the
-conversion for us::
+You will note that the file is tab-delimited but we need JSON to bulk-index data with Elasticsearch. So we can use a small script to do the conversion for us::
 
     import json
     import csv
@@ -88,8 +80,7 @@ Ensure to create a file called "README.txt" which can contain more information a
 
 Upload all three files to a place where it is publicly available. We choose ``http://benchmarks.elastic.co/corpora/geonames`` for this example. For initial local testing you can also place all files in the data directory, which is located below the root directory you specified when initially configuring Rally. Let's say you specified ``/Users/daniel/benchmarks`` as root directory. Then you have to place the data for a track with the name "geonames" in ``/Users/daniel/benchmarks/data/geonames`` so Rally can pick it up. Additionally, you have to specify the ``--offline`` option when running Rally so it does not try to download any benchmark data.
 
-Finally, add a new Python source file in Rally's project directory. By convention, the file should be called "$BENCHMARK_NAME_track.py", so
-for our example the file is called "geonames_track.py". It is placed in "esrally/track/". ::
+Finally, add a new Python source file in Rally's project directory. By convention, the file should be called "$BENCHMARK_NAME_track.py", so for our example the file is called "geonames_track.py". It is placed in "esrally/track/". ::
 
     from esrally.track import track
 
@@ -117,7 +108,7 @@ for our example the file is called "geonames_track.py". It is placed in "esrally
         mapping_file_name="mappings.json",
         # Queries to use in the search benchmark
         queries=[SampleQuery()],
-        track_setups=track.track_setups
+        challenges=track.challenges
 
 
 In case you want to add multiple indices this is possible too. The same track needs to specified as follows then: ::
@@ -153,11 +144,11 @@ In case you want to add multiple indices this is possible too. The same track ne
         ],
         # Queries to use in the search benchmark
         queries=[SampleQuery()],
-        track_setups=track.track_setups)
+        challenges=track.challenges)
 
 A few things to note:
 
-* You can either use the standard track setups provided with Rally or add your own. Note that Rally assumes that the track setup that should be run by default is called "defaults". It is possible to not use this name but it is more convenient for users.
+* You can either use the standard challenges provided with Rally or add your own. Note that Rally assumes that the challenge that should be run by default is called "append-no-conflicts". It is possible to not use this name but it is more convenient for users. Otherwise, they have to provide the command line option ``--challenge``.
 * You can add as many queries as you want. We use the `official Python Elasticsearch client <http://elasticsearch-py.readthedocs.org/>`_ to issue queries.
 * The numbers are needed to verify integrity and provide progress reports.
 
@@ -173,21 +164,21 @@ When you invoke ``esrally list tracks``, the new track should now appear::
                     /____/
     Available tracks:
     
-    Name        Description     Track setups
-    ----------  --------------- -------------------------------------------------------------------------------
-    geonames    Demo benchmark  defaults,4gheap,fastsettings,fastupdates,two_nodes_defaults,defaults_verbose_iw
+    Name        Description                                               Challenges
+    ----------  --------------------------------------------------------  -----------------------------------------------------------------------
+    geonames    Standard benchmark in Rally (8.6M POIs from Geonames)     append-no-conflicts,append-fast-no-conflicts,append-fast-with-conflicts
 
-Congratulations, you have created your first track! You can test it with ``esrally --track=geonames`` (or whatever the name of your track is) and run specific track setups with ``esrally --track=geonames --track-setup=fastupdates``.
+Congratulations, you have created your first track! You can test it with ``esrally --track=geonames`` (or whatever the name of your track is) and run specific challenges with ``esrally --track=geonames --challenge=append-fast-with-conflicts``.
 
 If you want to share it with the community, please read on.
 
-How to contribute a benchmark
------------------------------
+How to contribute a track
+-------------------------
 
 First of all, please read the `contributors guide <https://github.com/elastic/rally/blob/master/CONTRIBUTING.md>`_
 
-If you want to contribute your benchmark, follow these steps:
+If you want to contribute your track, follow these steps:
 
 1. Create a track file as described above
-2. Upload the associated data so they can be publicly downloaded via HTTP. The data have to include three files: the actual benchmark data (either as .bz2 (recommended) or as .zip), the mapping file, and a readme, called "README.txt" which has to contain also the licensing terms of the benchmark (respecting the licensing terms of the source data). Note that pull requests for benchmarks without a license cannot be accepted.
+2. Upload the associated data so they can be publicly downloaded via HTTP. The data have to include three files: the actual benchmark data (either as .bz2 (recommended) or as .zip), the mapping file, and a readme, called "README.txt" which has to contain also the licensing terms of the track (respecting the licensing terms of the source data). Note that pull requests for tracks without a license cannot be accepted.
 3. Create a pull request for the `Rally Github repo <https://github.com/elastic/rally>`_.
diff --git a/docs/developing.rst b/docs/developing.rst
@@ -50,13 +50,12 @@ To get a rough understanding of Rally, it makes sense to get to know its key com
 * `Race Control`: is responsible for proper execution of the race. It sets up all components and acts as a high-level controller.
 * `Mechanic`: can build and prepare a benchmark candidate for the race. It checks out the source, builds Elasticsearch, provisions and starts the cluster.
 * `Track`: is a concrete benchmarking scenario, e.g. the logging benchmark. It defines the data set to use.
-* `TrackSetup`: is a concrete system configuration for a benchmark, e.g. Elasticsearch default settings. Note: There are some lose ends in the code due to the porting efforts. The implementation is very likely to change significantly.
+* `Challenge`: is the specification on what benchmarks should be run and its configuration (e.g. index, then run a search benchmark with 1000 iterations)
+* `Car`: is a concrete system configuration for a benchmark, e.g. an Elasticsearch single-node cluster with default settings.
 * `Driver`: drives the race, i.e. it is executing the benchmark according to the track specification.
 * `Reporter`: A reporter tells us how the race went (currently only after the fact).
 
-When implementing a new benchmark, create a new file in `track` and create a new `Track` and one or more `TrackSetup` instances. 
-See `track/geonames_track.py` for an example. The new track will be picked up automatically. You can run Rally with your track 
-by issuing `esrally --track=your-track-name`. All available tracks can be listed with `esrally list tracks`.
+When implementing a new benchmark, create a new file in ``track`` and create a new ``Track`` and one or more ``Challenge`` instances. See ``track/geonames_track.py`` for an example and the :doc:`tutorial on adding benchmarks </adding_benchmarks>`. The new track will be picked up automatically. You can run Rally with your track by issuing ``esrally --track=your-track-name``. All available tracks can be listed with ``esrally list tracks``.
 
 How to contribute code
 ----------------------

diff --git a/docs/metrics.rst b/docs/metrics.rst
@@ -9,7 +9,8 @@ Here is a typical metrics record::
     {
           "environment": "nightly",
           "track": "geonames",
-          "track-setup": "defaults",
+          "challenge": "append-no-conflicts",
+          "car": "defaults",
           "sample-type": "normal",
           "trial-timestamp": "20160421T042749Z",
           "@timestamp": 1461213093093,
@@ -39,10 +40,10 @@ environment
 
 The environment describes the origin of a metric record. You define this value in the initial configuration of Rally. The intention is to clearly separate different benchmarking environments but still allow to store them in the same index.
 
-track, track-setup
-~~~~~~~~~~~~~~~~~~
+track, challenge, car
+~~~~~~~~~~~~~~~~~~~~~
 
-This is the track and track setup for which the metrics record has been produced.
+This is the track, challenge and car for which the metrics record has been produced.
 
 sample-type
 ~~~~~~~~~~~

diff --git a/docs/tournament.rst b/docs/tournament.rst
@@ -13,22 +13,16 @@ Suppose, we want to analyze the impact of a performance improvement. First, we n
                     /____/
     Recent races:
 
-    Race Timestamp     Track     Track setups    User Tag
-    -----------------  --------  --------------  ------------------------------
-    20160502T191011Z   geonames  defaults        intention:reduce_alloc_1234
-    20160502T190127Z   geonames  defaults        intention:baseline_github_1234
-    20160502T185632Z   tiny      defaults
-    20160502T185619Z   tiny      defaults
-    20160502T185604Z   tiny      defaults
-    20160502T185551Z   tiny      defaults
-    20160502T185538Z   tiny      defaults
-    20160502T185525Z   tiny      defaults
-    20160502T185511Z   tiny      defaults
-    20160502T185459Z   tiny      defaults
+    Race Timestamp    Track    Challenge            Car       User Tag
+    ----------------  -------  -------------------  --------  ------------------------------
+    20160518T122341Z  pmc      append-no-conflicts  defaults  intention:reduce_alloc_1234
+    20160518T112057Z  pmc      append-no-conflicts  defaults  intention:baseline_github_1234
+    20160518T101957Z  pmc      append-no-conflicts  defaults
+
 
 We can see that the user tag helps us to recognize races. We want to compare the two most recent races and have to provide the two race timestamps in the next step::
 
-    dm@io:~ $ esrally compare --baseline=20160502T190127Z --contender=20160502T191011Z
+    dm@io:~ $ esrally compare --baseline=20160518T112057Z --contender=20160518T112341Z
 
         ____        ____
        / __ \____ _/ / /_  __
@@ -38,12 +32,14 @@ We can see that the user tag helps us to recognize races. We want to compare the
                     /____/
 
     Comparing baseline
-      Race timestamp: 2016-05-02 19:01:27
-      Track setup: defaults
+      Race timestamp: 2016-05-18 11:20:57
+      Challenge: append-no-conflicts
+      Car: defaults
 
     with contender
-      Race timestamp: 2016-05-02 19:10:11
-      Track setup: defaults
+      Race timestamp: 2016-05-18 12:23:41
+      Challenge: append-no-conflicts
+      Car: defaults
 
     ------------------------------------------------------
         _______             __   _____

diff --git a/esrally/config.py b/esrally/config.py
@@ -22,7 +22,7 @@ class Scope(Enum):
     # A sole benchmark
     benchmark = 3
     # Single benchmark track setup (e.g. default, multinode, ...)
-    trackSetup = 4
+    challenge = 4
     # property for every invocation, i.e. for backtesting
     invocation = 5