Skip to content

Commit

Permalink
Introduce challenges and cars as replacement for track setup
Browse files Browse the repository at this point in the history
Closes #101
  • Loading branch information
danielmitterdorfer committed May 18, 2016
1 parent 232ddcd commit 8cf974e
Show file tree
Hide file tree
Showing 28 changed files with 412 additions and 445 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
### 0.3.0

#### Breaking changes

We have [separated the previously known "track setup" into two parts](https://github.com/elastic/rally/issues/101):

* Challenges: Which describe what happens during a benchmark (whether to index or search and with which parameters)
* Cars: Which describe the benchmark candidate settings (e.g. heap size, logging configuration etc.)

This influences the command line interface in a couple of ways:

* To list all known cars, we have added a new command `esrally list cars`. To select a challenge, use now `--challenge` instead of `--track-setup` and also specify a car now with `--car`.
* Tournaments created by older versions of Rally are incompatible
* Rally must now be invoked with only one challenge and only one car (previously it was possible to specify multiple track setups)

### 0.2.1

* Add a [tournament mode](https://github.com/elastic/rally/issues/57). More information in the [user docs](https://esrally.readthedocs.io/en/latest/tournament.html)
Expand Down
49 changes: 20 additions & 29 deletions docs/adding_benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ Adding new benchmarks to Rally
Overview
--------

Although it is possible to add new benchmarks to Rally, it is needed to :doc:`set up Rally in development mode first </developing>`. We will
eventually `split benchmark specifications from Rally <https://github.com/elastic/rally/issues/26>`_ but the API is currently not stable
enough to support this reliably.
Although it is possible to add new benchmarks to Rally, it is needed to :doc:`set up Rally in development mode first </developing>`. We will eventually `split benchmark specifications from Rally <https://github.com/elastic/rally/issues/26>`_ but the API is currently not stable enough to support this reliably.

First of all we need to clarify what a benchmark is. Rally has a few assumptions built-in:

Expand All @@ -19,25 +17,19 @@ A benchmark is called a "track" in Rally. The most important attributes of a tra
* One or more indices, each with one or more types
* The queries to issue
* Source URL of the benchmark data
* A list of track setups
* A list of steps to run, which we'll call "challenge", for example indexing data with a specific number of documents per bulk request or running searches for a defined number of iterations.

A "track setup" defines custom settings of the benchmark candidate (Elasticsearch) for this track, like how much heap memory to use, the
number of nodes to start and so on. Rally comes with a set of default track setups which you can use for your own benchmarks (but you don't
have to).
Separately from a track, we also have "cars" which define the settings of the benchmark candidate (Elasticsearch), like how much heap memory to use, the number of nodes to start and so on. Rally comes with a set of default tracks and cars which you can use for your own benchmarks (but you don't have to).

Example benchmark
-----------------
Example track
-------------

Let's create an example benchmark step by step. First of all, we need some data. There are a lot of public data sets available which are
interesting for new benchmarks and we also have a
Let's create an example track step by step. First of all, we need some data. There are a lot of public data sets available which are interesting for new benchmarks and we also have a
`backlog of benchmarks to add <https://github.com/elastic/rally/issues?q=is%3Aissue+is%3Aopen+label%3A%3ABenchmark>`_.

`Geonames <http://www.geonames.org/>`_ provides geo data under a `creative commons license <http://creativecommons.org/licenses/by/3.0/>`_. We
will download `allCountries.zip <http://download.geonames.org/export/dump/allCountries.zip>`_ (around 300MB), extract it and
inspect ``allCountries.txt``.
`Geonames <http://www.geonames.org/>`_ provides geo data under a `creative commons license <http://creativecommons.org/licenses/by/3.0/>`_. We will download `allCountries.zip <http://download.geonames.org/export/dump/allCountries.zip>`_ (around 300MB), extract it and inspect ``allCountries.txt``.

You will note that the file is tab-delimited but we need JSON to bulk-index data with Elasticsearch. So we can use a small script to do the
conversion for us::
You will note that the file is tab-delimited but we need JSON to bulk-index data with Elasticsearch. So we can use a small script to do the conversion for us::

import json
import csv
Expand Down Expand Up @@ -88,8 +80,7 @@ Ensure to create a file called "README.txt" which can contain more information a

Upload all three files to a place where it is publicly available. We choose ``http://benchmarks.elastic.co/corpora/geonames`` for this example. For initial local testing you can also place all files in the data directory, which is located below the root directory you specified when initially configuring Rally. Let's say you specified ``/Users/daniel/benchmarks`` as root directory. Then you have to place the data for a track with the name "geonames" in ``/Users/daniel/benchmarks/data/geonames`` so Rally can pick it up. Additionally, you have to specify the ``--offline`` option when running Rally so it does not try to download any benchmark data.

Finally, add a new Python source file in Rally's project directory. By convention, the file should be called "$BENCHMARK_NAME_track.py", so
for our example the file is called "geonames_track.py". It is placed in "esrally/track/". ::
Finally, add a new Python source file in Rally's project directory. By convention, the file should be called "$BENCHMARK_NAME_track.py", so for our example the file is called "geonames_track.py". It is placed in "esrally/track/". ::

from esrally.track import track

Expand Down Expand Up @@ -117,7 +108,7 @@ for our example the file is called "geonames_track.py". It is placed in "esrally
mapping_file_name="mappings.json",
# Queries to use in the search benchmark
queries=[SampleQuery()],
track_setups=track.track_setups
challenges=track.challenges


In case you want to add multiple indices this is possible too. The same track needs to specified as follows then: ::
Expand Down Expand Up @@ -153,11 +144,11 @@ In case you want to add multiple indices this is possible too. The same track ne
],
# Queries to use in the search benchmark
queries=[SampleQuery()],
track_setups=track.track_setups)
challenges=track.challenges)

A few things to note:

* You can either use the standard track setups provided with Rally or add your own. Note that Rally assumes that the track setup that should be run by default is called "defaults". It is possible to not use this name but it is more convenient for users.
* You can either use the standard challenges provided with Rally or add your own. Note that Rally assumes that the challenge that should be run by default is called "append-no-conflicts". It is possible to not use this name but it is more convenient for users. Otherwise, they have to provide the command line option ``--challenge``.
* You can add as many queries as you want. We use the `official Python Elasticsearch client <http://elasticsearch-py.readthedocs.org/>`_ to issue queries.
* The numbers are needed to verify integrity and provide progress reports.

Expand All @@ -173,21 +164,21 @@ When you invoke ``esrally list tracks``, the new track should now appear::
/____/
Available tracks:
Name Description Track setups
---------- --------------- -------------------------------------------------------------------------------
geonames Demo benchmark defaults,4gheap,fastsettings,fastupdates,two_nodes_defaults,defaults_verbose_iw
Name Description Challenges
---------- -------------------------------------------------------- -----------------------------------------------------------------------
geonames Standard benchmark in Rally (8.6M POIs from Geonames) append-no-conflicts,append-fast-no-conflicts,append-fast-with-conflicts

Congratulations, you have created your first track! You can test it with ``esrally --track=geonames`` (or whatever the name of your track is) and run specific track setups with ``esrally --track=geonames --track-setup=fastupdates``.
Congratulations, you have created your first track! You can test it with ``esrally --track=geonames`` (or whatever the name of your track is) and run specific challenges with ``esrally --track=geonames --challenge=append-fast-with-conflicts``.

If you want to share it with the community, please read on.

How to contribute a benchmark
-----------------------------
How to contribute a track
-------------------------

First of all, please read the `contributors guide <https://github.com/elastic/rally/blob/master/CONTRIBUTING.md>`_

If you want to contribute your benchmark, follow these steps:
If you want to contribute your track, follow these steps:

1. Create a track file as described above
2. Upload the associated data so they can be publicly downloaded via HTTP. The data have to include three files: the actual benchmark data (either as .bz2 (recommended) or as .zip), the mapping file, and a readme, called "README.txt" which has to contain also the licensing terms of the benchmark (respecting the licensing terms of the source data). Note that pull requests for benchmarks without a license cannot be accepted.
2. Upload the associated data so they can be publicly downloaded via HTTP. The data have to include three files: the actual benchmark data (either as .bz2 (recommended) or as .zip), the mapping file, and a readme, called "README.txt" which has to contain also the licensing terms of the track (respecting the licensing terms of the source data). Note that pull requests for tracks without a license cannot be accepted.
3. Create a pull request for the `Rally Github repo <https://github.com/elastic/rally>`_.
7 changes: 3 additions & 4 deletions docs/developing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,12 @@ To get a rough understanding of Rally, it makes sense to get to know its key com
* `Race Control`: is responsible for proper execution of the race. It sets up all components and acts as a high-level controller.
* `Mechanic`: can build and prepare a benchmark candidate for the race. It checks out the source, builds Elasticsearch, provisions and starts the cluster.
* `Track`: is a concrete benchmarking scenario, e.g. the logging benchmark. It defines the data set to use.
* `TrackSetup`: is a concrete system configuration for a benchmark, e.g. Elasticsearch default settings. Note: There are some lose ends in the code due to the porting efforts. The implementation is very likely to change significantly.
* `Challenge`: is the specification on what benchmarks should be run and its configuration (e.g. index, then run a search benchmark with 1000 iterations)
* `Car`: is a concrete system configuration for a benchmark, e.g. an Elasticsearch single-node cluster with default settings.
* `Driver`: drives the race, i.e. it is executing the benchmark according to the track specification.
* `Reporter`: A reporter tells us how the race went (currently only after the fact).

When implementing a new benchmark, create a new file in `track` and create a new `Track` and one or more `TrackSetup` instances.
See `track/geonames_track.py` for an example. The new track will be picked up automatically. You can run Rally with your track
by issuing `esrally --track=your-track-name`. All available tracks can be listed with `esrally list tracks`.
When implementing a new benchmark, create a new file in ``track`` and create a new ``Track`` and one or more ``Challenge`` instances. See ``track/geonames_track.py`` for an example and the :doc:`tutorial on adding benchmarks </adding_benchmarks>`. The new track will be picked up automatically. You can run Rally with your track by issuing ``esrally --track=your-track-name``. All available tracks can be listed with ``esrally list tracks``.

How to contribute code
----------------------
Expand Down
9 changes: 5 additions & 4 deletions docs/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Here is a typical metrics record::
{
"environment": "nightly",
"track": "geonames",
"track-setup": "defaults",
"challenge": "append-no-conflicts",
"car": "defaults",
"sample-type": "normal",
"trial-timestamp": "20160421T042749Z",
"@timestamp": 1461213093093,
Expand Down Expand Up @@ -39,10 +40,10 @@ environment

The environment describes the origin of a metric record. You define this value in the initial configuration of Rally. The intention is to clearly separate different benchmarking environments but still allow to store them in the same index.

track, track-setup
~~~~~~~~~~~~~~~~~~
track, challenge, car
~~~~~~~~~~~~~~~~~~~~~

This is the track and track setup for which the metrics record has been produced.
This is the track, challenge and car for which the metrics record has been produced.

sample-type
~~~~~~~~~~~
Expand Down
30 changes: 13 additions & 17 deletions docs/tournament.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,16 @@ Suppose, we want to analyze the impact of a performance improvement. First, we n
/____/
Recent races:

Race Timestamp Track Track setups User Tag
----------------- -------- -------------- ------------------------------
20160502T191011Z geonames defaults intention:reduce_alloc_1234
20160502T190127Z geonames defaults intention:baseline_github_1234
20160502T185632Z tiny defaults
20160502T185619Z tiny defaults
20160502T185604Z tiny defaults
20160502T185551Z tiny defaults
20160502T185538Z tiny defaults
20160502T185525Z tiny defaults
20160502T185511Z tiny defaults
20160502T185459Z tiny defaults
Race Timestamp Track Challenge Car User Tag
---------------- ------- ------------------- -------- ------------------------------
20160518T122341Z pmc append-no-conflicts defaults intention:reduce_alloc_1234
20160518T112057Z pmc append-no-conflicts defaults intention:baseline_github_1234
20160518T101957Z pmc append-no-conflicts defaults


We can see that the user tag helps us to recognize races. We want to compare the two most recent races and have to provide the two race timestamps in the next step::

dm@io:~ $ esrally compare --baseline=20160502T190127Z --contender=20160502T191011Z
dm@io:~ $ esrally compare --baseline=20160518T112057Z --contender=20160518T112341Z

____ ____
/ __ \____ _/ / /_ __
Expand All @@ -38,12 +32,14 @@ We can see that the user tag helps us to recognize races. We want to compare the
/____/

Comparing baseline
Race timestamp: 2016-05-02 19:01:27
Track setup: defaults
Race timestamp: 2016-05-18 11:20:57
Challenge: append-no-conflicts
Car: defaults

with contender
Race timestamp: 2016-05-02 19:10:11
Track setup: defaults
Race timestamp: 2016-05-18 12:23:41
Challenge: append-no-conflicts
Car: defaults

------------------------------------------------------
_______ __ _____
Expand Down
2 changes: 1 addition & 1 deletion esrally/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ class Scope(Enum):
# A sole benchmark
benchmark = 3
# Single benchmark track setup (e.g. default, multinode, ...)
trackSetup = 4
challenge = 4
# property for every invocation, i.e. for backtesting
invocation = 5

Expand Down
Loading

0 comments on commit 8cf974e

Please sign in to comment.