Skip to content

Latest commit

 

History

History
136 lines (93 loc) · 6.3 KB

File metadata and controls

136 lines (93 loc) · 6.3 KB

Apache Pulsar benchmarks

This folder houses all of the assets necessary to run benchmarks for Apache Pulsar. In order to run these benchmarks, you'll need to:

Creating local artifacts

In order to create the local artifacts necessary to run the Pulsar benchmarks in AWS, you'll need to have Maven installed. Once Maven's installed, you can create the necessary artifacts with a single Maven command:

$ mvn install

Creating a Pulsar cluster on Amazon Web Services (AWS) using Terraform and Ansible

In order to create an Apache Pulsar cluster on AWS, you'll need to have the following installed:

In addition, you will need to:

Once those conditions are in place, you'll need to create an SSH public and private key at ~/.ssh/pulsar_aws (private) and ~/.ssh/pulsar_aws.pub (public), respectively.

$ ssh-keygen -f ~/.ssh/pulsar_aws

When prompted to enter a passphrase, simply hit Enter twice. Then, make sure that the keys have been created:

$ ls ~/.ssh/pulsar_aws*

With SSH keys in place, you can create the necessary AWS resources using a single Terraform command:

$ cd driver-pulsar/deploy
$ terraform init
$ terraform apply

That will install the following EC2 instances (plus some other resources, such as a Virtual Private Cloud (VPC)):

Resource Description Count
Pulsar/BookKeeper instances The VMs on which a Pulsar broker and BookKeeper bookie will run 3
ZooKeeper instances The VMs on which a ZooKeeper node will run 3
Client instance The VM from which the benchmarking suite itself will be run 1

When you run terraform apply, you will be prompted to type yes. Type yes to continue with the installation or anything else to quit.

Once the installation is complete, you will see a confirmation message listing the resources that have been installed.

Variables

There's a handful of configurable parameters related to the Terraform deployment that you can alter by modifying the defaults in the terraform.tfvars file.

Variable Description Default
region The AWS region in which the Pulsar cluster will be deployed us-west-2
public_key_path The path to the SSH public key that you've generated ~/.ssh/pulsar_aws.pub
ami The Amazon Machine Image (AWI) to be used by the cluster's machines ami-9fa343e7
instance_types The EC2 instance types used by the various components i3.4xlarge (Pulsar brokers and BookKeeper bookies), t2.small (ZooKeeper), c4.8xlarge (benchmarking client)

If you modify the public_key_path, make sure that you point to the appropriate SSH key path when running the Ansible playbook.

Running the Ansible playbook

With the appropriate infrastructure in place, you can install and start the Pulsar cluster using Ansible with just one command:

$ ansible-playbook \
  --user ec2-user \
  --inventory `which terraform-inventory` \
  deploy.yaml

If you're using an SSH private key path different from ~/.ssh/pulsar_aws, you can specify that path using the --private-key flag, for example --private-key=~/.ssh/my_key.
If it's keep asking for the ssh key passphrase, you may add the keys to the ssh agent by running ssh-agent bash and ssh-add ~/.ssh/pulsar_aws.

SSHing into the client host

In the output produced by Terraform, there's a client_ssh_host variable that provides the IP address for the client EC2 host from which benchmarks can be run. You can SSH into that host using this command:

$ ssh -i ~/.ssh/pulsar_aws ec2-user@$(terraform output client_ssh_host)

Running the benchmarks from the client host

Once you've successfully SSHed into the client host, you can run all available benchmark workloads like this:

$ cd /opt/benchmark
$ sudo bin/benchmark --drivers driver-pulsar/pulsar.yaml workloads/*.yaml

You can also run specific workloads in the workloads folder. Here's an example:

$ sudo bin/benchmark --drivers driver-pulsar/pulsar.yaml workloads/1-topic-16-partitions-1kb.yaml

There are multiple Pulsar "modes" for which you can run benchmarks. Each mode has its own YAML configuration file in the driver-pulsar folder.

Mode Description Config file
Standard Pulsar with message de-duplication disabled (at-least-once semantics) pulsar.yaml
Effectively once Pulsar with message de-duplication enabled ("effectively-once" semantics) pulsar-effectively-once.yaml

The example used the "standard" mode as configured in driver-pulsar/pulsar.yaml. To run all available benchmark workloads in "effectively once" mode:

$ sudo bin/benchmark --drivers driver-pulsar/pulsar-effectively-once.yaml workloads/*.yaml

Here's an example of running a specific benchmarking workload in effectively once mode:

$ sudo bin/benchmark --drivers driver-pulsar/pulsar-effectively-once.yaml workloads/1-topic-16-partitions-1kb.yaml