Reproducability Package - Yannakakis in Column Stores

This repository contains supplemental material for reproducing the results reported our paper Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores. Full version of the paper can be found here.

The repository is organized as follows:

2phase_nsa: Code for converting binary plans to 2NSA plans.
benchmarks: Benchmark queries and data.
duckdb: Fork of DuckDB.
duckdb_scripts: Scripts for loading data and running queries in DuckDB.
experiments: Experiment scripts and results.
intermediate_to_df_plan: Rust code for running query plans (binary & 2NSA plans) in Datafusion.
query_plans: Binary and 2NSA query plans specified in JSON format for all benchmark queries.
yannakakis-join-implementation: Yannakakis implementation in Datafusion. (dependency of intermediate_to_df_plan)

Do not forget to clone submodules well when cloning this repository. You can do this by adding the --recursive flag to the git clone command:

git clone --recursive <URL>

Analyze Results

If you only want to analyze our results reported in the paper, you can inspect the results in the experiments folder. For the CE benchmark, the output folder was too large for GitHub, so we only uploaded a compressed version. You should extract it (in the same folder) before running the analysis notebook.

Run Experiments

Be aware that results might differ from machine to machine. We conducted all experiments on a Ubuntu 22.04.4 LTS machine with an Intel Core i7-11800 CPU and 32GB of RAM. Prerequisites for running the experiments are (between brackets are the versions we used):

Rust (1.74.1) with nightly toolchain
Python (3.10.12) & pip (22.0.2)
Make (4.3) & Ninja (1.10.1)

We provided some bash script to make it easier to run the experiments: setup.sh, clean.sh, and run_queries.sh. They should be executed from the root directory and also in the following order:

# cwd = root directory of this repository

# Install required pip packages
# (Create a new virtual environment if desired)
pip install -r requirements.txt

# Download datasets, build DuckDB, build Yannakakis implementation,...
chmod +x setup.sh
./setup.sh

# Remove all experiment results that are currently in the repo (query plans, csv files, ...)
chmod +x clean.sh
./clean.sh

# Run all benchmark queries (DuckDB-bin, Datafusion-bin, Datafusion-2NSA)
chmod +x run_queries.sh
./run_queries.sh

You can now run, for each benchmark, the notebook experiments/<benchmark>/analysis.ipynb to analyze the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducability Package - Yannakakis in Column Stores

Analyze Results

Run Experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
2phase_nsa		2phase_nsa
benchmarks		benchmarks
duckdb @ d407241		duckdb @ d407241
duckdb_scripts		duckdb_scripts
experiments		experiments
intermediate_to_df_plan		intermediate_to_df_plan
query_plans		query_plans
to_intermediate_plan		to_intermediate_plan
yannakakis-join-implementation		yannakakis-join-implementation
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
clean.sh		clean.sh
paper.pdf		paper.pdf
requirements.txt		requirements.txt
run_queries.sh		run_queries.sh
setup.sh		setup.sh

UHasselt-DSI-Data-Systems-Lab/code-reproducability-yannakakis

Folders and files

Latest commit

History

Repository files navigation

Reproducability Package - Yannakakis in Column Stores

Analyze Results

Run Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages