Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mkhayati authored Aug 5, 2024
1 parent 22c24a5 commit 730cf80
Showing 1 changed file with 16 additions and 25 deletions.
41 changes: 16 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
# SEER

SEER is an online tool to evaluate the performance of time series database systems on large datasets.
The tool builds upon our TSM-Bench benchmark [TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications, PVLDB'23](https://www.vldb.org/pvldb/vol16/p3363-khelifati.pdf).
SEER compares seven Time Series Database Systems (TSDBs) using a mixed set of workloads. It implements a novel data generation method that augments seed real-world time series datasets, enabling realistic and scalable benchmarking.
<!---
Technical details can be found in the paper SEER: An End-to-End Toolkit to Evaluate Time Series Database Systems, SIGMOD'24
-->
SEER is an online tool for evaluating the performance of seven Time Series Database Systems (TSDBs) using a mixed set of workloads. The tool builds upon our TSM-Bench benchmark [TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications, PVLDB'23](https://www.vldb.org/pvldb/vol16/p3363-khelifati.pdf). SEER implements an end-to-end pipeline for database benchmarking, from data generation and workload evaluation to feature contamination.
Technical details can be found in the paper SEER: An End-to-End Toolkit for Benchmarking Time Series Database Systems in Monitoring Applications, PVLDB'24
- List of benchmarked systems: [ClickHouse](https://clickhouse.com/), [Druid](https://druid.apache.org/), [eXtremeDB](https://www.mcobject.com/)*, [InfluxDB](https://docs.influxdata.com/influxdb/v1.7/), [MonetDB](https://www.monetdb.org/easy-setup/), [QuestDB](https://questdb.io/), [TimescaleDB](https://www.timescale.com/).
- SEER evaluates bulk-loading, storage performance, offline/online query performance, and the impact of time series features on compression.
- The tool uses two datasets for the evaluation: *D-LONG [d1] and D-MULTI [d2]*. The evaluated datasets can be found [here](https://github.com/eXascaleInfolab/TSM-Bench/tree/main/datasets).
- <sup>*</sup>**Note**: Due to license restrictions, we can only share the evaluation version of extremeDB. The results between the benchmarked and the public version might diverge.
- SEER evaluates time series generation, offline/online query performance, and the impact of time series features on storage.
- SEER uses various hydrological datasets provided by the Swiss Federal Office for the Environment (FOEN). The evaluated datasets can be found [here](https://github.com/eXascaleInfolab/TSM-Bench/tree/main/datasets).
- <sup>*</sup>**Note**: Due to license restrictions, we can only share the evaluation version of extremeDB.


SEER was created at the eXascale Infolab, a research group at the University of Fribourg, Switzerland.
SEER was created at the eXascale Infolab, a research group at the University of Fribourg, Switzerland, under the direction of Dr. Mourad Khayati.

___

Expand Down Expand Up @@ -47,15 +43,12 @@ docker-compose up -d --build
sh setup/init_seer.sh
sh setup/migrate_query_data.sh
```


___

## Contributors

- Luca Althaus
- [Mourad Khayati](https://exascale.info/members/mourad-khayati/) (mkhayati@exascale.info)
- Abdel Khelifati (abdel@exascale.info)

[//]: # (### Load query data into django models)

Expand Down Expand Up @@ -90,41 +83,39 @@ The installation and loading of the systems for the live execution setup can be
### Adding New Results
- offline
- **Offline**
1. Go to `query_data/offline_queries` folder
2. Select the dataset folder and add the results of the system in a file system_name.csv
the file contains the following columns:
2. Select the dataset folder and add the results of the system in a file named `system_name.csv`. The file contains the following columns:
- runtime: the computed runtime of the query
- variance: the variance of the query
- query: the query e.g q4
- query: the query number (e.g., q4)
- n_s : the number of sensors
- n_st : the number of stations
- timerange : the time range of the query
- online
- **Online**
1. Go to `query_data/online_queries` folder:
2. Select the dataset folder and add the results of the system in a file system_name.csv
the file contains the following columns:
2. Select the dataset folder and add the results of the system in a file named `system_name.csv`. The file contains the following columns:
- runtime: the computed runtime of the query
- variance: the variance of the query
- query: the query e.g q4
- query: the query number (e.g., q4)
- n_s : the number of sensors
- n_st : the number of stations
- timerange : the time range of the query
- insertion_rate: the ingestion rate
### Adding New System Configuration
- offline
1. Install the system following the TSM-Bench setup
- **Offline**
1. Install the system following the TSM-Bench instructions
2. Go to `views/offline_queries_view.py` update the context of the query class and add the system to systems (line 32).
3. Add the name of the system to `utils/CONSTANTS.py` and to `views/offline_queries_view.py` (Line 10)
4. Go to "djangoProject/models/load_query_data.py" and add the system to the systems list (line 6).
5. Load the query data into the django models
```bash
sh setup/sh setup/migrate_query_data.sh
```
- online
1. Install the system following the TSM-Bench setup
- **Online**
1. Install the system following the TSM-Bench instructions
2. Go to `views/online_queries_view.py` and update the context of the query class by adding the system to systems (line 38).
3. Add the name of the system to `utils/CONSTANTS.py` (if not done in offline) and to `views/offline_queries_view.py` (Line 6)
Expand Down

0 comments on commit 730cf80

Please sign in to comment.