: nexmark benchmark code for kafka streamsboki
: modified Bokiimpeller
: impeller source codeimpeller-experiments
: impeller experiments script
Our evaluation workloads run on AWS EC2 instances in us-east-2
EC2 VMs for running experiments use a public AMI (ami-0c6de836734de3280
) from Boki,
which is based on Ubuntu 20.04 with necessary dependencies installed.
A controller machine in AWS us-east-2
region is required for running scripts executing experiments.
The controller machine can use very small EC2 instance type, as it only provisions and controls experiment VMs,
but does not affect experimental results.
In our own setup, we use a t3.micro
EC2 instance installed with Ubuntu 20.04 as the controller machine. If you also
want to compiling the code on the controller machine, we use a c5.xlarge
EC2 instance.
The controller needs to set an IAM role. The IAM role needs to have these permission policy: AmazonEC2FullAccess and IAMReadOnlyAccess. In additional to the AWS managed permission policies, we also set up a custom policy pass-iam-role:
"Version": "2012-10-17",
"Statement": [
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<account_id>:role/*"
On the controller machine, clone this repository with all git submodules
git clone --recursive
To compile source code, the controller machine needs to install build dependencies by executing the following scripts.
This script installs latest version of AWS CLI version 1
and this documentation
details the recommanded way for installing AWS CLI version 1 if you decide to not using the latest one.
Once installed, AWS CLI has to be configured with region us-east-2
and access key
(see this documentation).
Execute ./impeller-experiments/scripts/
to setup SSH keys that will be used to access experiment VMs.
Please read the notice in scripts/
before executing it to see if this script works for your setup.
Our VM provisioning script creates EC2 instances with security group impeller
and placement group impeller-experiments
The security group includes firewall rules for experiment VMs (including allowing the controller machine to SSH into them),
while the placement group instructs AWS to place experiment VMs close together.
Executing scripts/
on the controller machine creates these groups with correct configurations.
- build dependencies
cd ./boki/tmp/ && ./ && cd -
- build source code
cd ./boki/ && make -j2 && cd -
- build dependencies
cd ./nexmark/nexmark-kafka-streams/ && task build
cd ./impeller/ && make -j4
- latency
sudo apt-get install -y fontconfig libfontconfig-dev cd ./impeller-experiments/latency/ && cargo build --release && cd -
- run query 1 for 60 seconds with 1 iterations
cd ./impeller-experiments/nexmark_impeller/ && ./ && cd -
- experiments on Impeller
cd ./impeller-experiments/nexmark_impeller/ # run ./ to ./
- experiments on Kafka Streams
cd ./impeller-experiments/nexmark_kafka-streams # run ./ to ./
Serially execute these scripts are estimated to take 6300 mins.
cd ./impeller-experiments/nexmark_impeller/
# run ./ to ./
Serially execute these scripts are estimated to take 1600 mins.
For Kafka Stream results, query 1
latency scan --prefix q1_sink_ets --output $output_dir $q1_exp_dir # the exp dir is the dir that contains logs
For q2 to q8, change the prefix from q1_sink_ets to q2_sink_ets .. q8_sink_ets
For impeller experiments,
- query 1
latency scan --prefix query1 --suffix .json.gz --output $output_dir $q1_exp_dir
- query 2
latency scan --prefix query2 --suffix .json.gz --output $output_dir $q2_exp_dir
- query 3
latency scan --prefix q3JoinTable --suffix .json.gz --output $output_dir $q3_exp_dir
- query 4
latency scan --prefix q4Avg --suffix .json.gz --output $output_dir $q4_exp_dir
- query 5
latency scan --prefix q5maxbid --suffix .json.gz --output $output_dir $q5_exp_dir
- query 6
latency scan --prefix q6Avg --suffix .json.gz --output $output_dir $q6_exp_dir
- query 7
latency scan --prefix q7JoinMaxBid --suffix .json.gz --output $output_dir $q7_exp_dir
- query 8
latency scan --prefix q8JoinStream --suffix .json.gz --output $output_dir $q8_exp_dir
All of our repository included in the artifact are in Apache 2.0 License. We use a script which is districuted in MIT License.
Impeller: Stream Processing on Shared Logs
- For long running experiments, sometimes Boki cluster might fail to setup and if you see the experiment emits a timeout error for waiting the Boki to finish setup, you need to record the current progress and restart the experiment.
- Impeller is a research quality product which might have multiple sharp edges. Do not use this repo in production environment.