Note: This tool is deprecated. Please use the Swarm64 DA Benchmark Toolkit instead
This toolkit provides methods to execute the TPC-H, TPC-DS, and SSB benchmark on Swarm64 DA and native PostgreSQL.
- Python min. 3.6 and pip3
- For TPC-DS only: Linux package
recode
- Install additional packages, for Python 3.6 eg. with:
/usr/bin/python3.6 -m pip install -r requirements.txt
- For loading the data, the database must be accessible with the user
postgres
without password
-
To load a database with a dataset, go to the correct benchmark directory:
For TPC-H:cd schemas/tpch
For TPC-DS:cd schemas/tpcds
For SSB:cd schemas/ssb
-
Run the
loader.sh
script with the following parameters:./loader.sh
--schema=
--scale-factor=
--dbname=For example in
schemas/tpch
:
./loader.sh --schema=sdb_all_tables --scale-factor=1000 --dbname=example-database
Parameter | Description |
---|---|
schema |
The schema to deploy. Schemas are directories the benchmarks/<benchmark>/schemas directory and start with either sdb_ , s64da_ , or psql_ . The schema name equals the directory name. |
scale-factor |
The scale factor to use, such as 10 , 100 or 1000 . |
dbname |
The name of the target database. If the database does not exist, it will be created. If it does exist, it will be deleted and recreated. |
Parameter | Description |
---|---|
num-partitions |
The number of partitions to use, if applicable. Default: 32 |
chunks |
Chunk large tables into smaller pieces during ingestion. Default: 10 |
db-host |
Alternative host for the database. Default: localhost |
db-port |
Alternative port for the database. Default: 5432 |
Depending on the scale factor you choose, the time it takes for the script to finish might take up to several hours. After the script creates the database, it loads the data, creates primary keys, foreign keys, and indices. Afterwards, it runs VACUUM and ANALYZE.
Start a benchmark:
./swarm64_run_tpc_benchmark \
--dsn postgresql://postgres@localhost/<target-db> \
--benchmark <tpch|tpcds|ssb>
This runs the benchmark with a 15 minute per query runtime restriction. You can
use the --timeout
parameter to adjust this limit.
Parameter | Description |
---|---|
dsn |
The full DSN of the DB to connect to. DSN layout: postgresql://<user>@<host>:<target-port>/<target-db> The port is optional and the default is 5432. Example with port 5433: --dsn postgresql://postgres@localhost:5433/example-database |
benchmark |
The benchmark to use: tpch , tpcds or ssb |
Parameter | Description |
---|---|
config |
Path to additional YAML configuration file. |
timeout |
The maximum time a query may run, such as 30min . |
You can modify the existing configuration files located under the configs
directory. Per default, the tests loads the respective default.yaml
configuration file.
Alternatively, you can create an additional configuration file to control
test execution more granularly. An example YAML file for the TPC-H benchmark
is as follows:
timeout: 30min
ignore:
- 18
- 20
- 21
dbconfig:
max_parallel_workers: 96
max_parallel_workers_per_gather: 32
To use this file, pass the --config=<path-to-file>
argument to the test
executor. In this example, the query timeout is set to 30min
. Queries 18, 20,
and 21 will not execute. Additionally, the database parameters
max_parallel_workers
will change to 96 and max_parallel_workers_per_gather
will change to 32
. Any change to the database configuration is applied before
the benchmark starts and are reverted after the benchmark completes.