Skip to content
housejester edited this page Jan 31, 2013 · 4 revisions

Instructions

  1. Provision some servers. This can all be just one machine if you want. You will just need to know up front a hostname for your mysql server, zookeeper server, and the druid broker (and they can all be the same hostname or localhost). I'd recommend giving it a shot on a single server first.

  2. Install mysql on your mysql host. Create a druid database on the server, and grant all privileges to a druid user with password 'diurd'.

  3. Update env-cluster.sh with your cluster details. You'll need to specify the hostname for mysql, zookeeper, and the druid broker (they can be the same host or all localhost for trying it on a single server), the other hosts you can leave what they are. Also, be sure to specify your AWS credentials and S3 bucket name if you want the shard upload to work (this isn't necessary if you're just trying out the realtime server).

  4. Tar up this whole directory (after you've updated env-cluster.sh) and scp it to your servers. If you're just running everything on one server, just run them all locally.

  5. Run the scripts in numerical order on the appropriate hosts. So, for example, go to your zookeeper server and run the 01-start-zookeeper.sh script.

Don't leave the firehose running for too long...watch your disk.

To stop everyting, just run the stop-* scripts from the appropriate hosts to stop the services. Stop zookeeper last (the Druid realtime node refuses to shutdown if zookeeper is already down). for example, running:

./stop-firehose.sh

on the firehose server will stop the firehose.

The Firehose Sample Data

The example data generated by the firehose is using d8a-conjure. You can see what it looks like if you cat just cat firehose/appevents.txt. See http://conjure.d8a.io for info on how the template works. You can see the data generated by cd'ing to firehose and running:

java -jar d8a-conjure-1.0-SNAPSHOT.jar -template appevents.txt

The sample data will be generated to the console.

The Druid Realtime Spec for the Sample Data

The spec used for the druid realtime node for ingesting that data is in druid/appevents_realime.spec

The Query

The query for the data is in queryies/event_counts_query.body

Making changes

You can make changes to the sample data itself just by editing the appevents.txt (again just see the d8a-conjure site for what you can do with the template). Then, you can tweak the spec to account for whatever aggregations you want to do. Finally, you can change the event_counts_query.body to do whatever query you want against the data (just be sure you always lower case ALL names and fieldNames). When you make any changes you'll need to bounce the firehose and the realtime node.

I haven't tested what would happen if you change the dimensions and then have old data with different dimensions.

Nuking and Starting Over

If you need to just nuke everything and start over:

  • Stop everything using the stop-* scripts (stop zookeeper last)
  • Remove all /tmp/druid* directories
  • Remove all /tmp/kafka-* directories
  • Remove the /tmp/realtime directory
  • Remove the /tmp/zookeeper directory
  • Bring everything back up.