Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 1.19 KB

README.md

File metadata and controls

26 lines (17 loc) · 1.19 KB

Pigeon - An Event Driven Job Control Framework for Hadoop to help Chain Multiple MapReduce jobs

Overview

Hadoop MapReduce is a parallel computation framework for processing large and distributed data sets. In many cases, users want to chain multiple MapReduce jobs to accomplish complex tasks. Usually, such complex tasks are data-driven with data funneled through a sequence of jobs. In this project we have implemented a distributed notification system for Hadoop to help chain multiple MapReduce jobs based on events occurring in Hadoop cluster.

##Prerequisites To run Pigeon publisher and subscriber you will need to download [ActiveMQ] (http://activemq.apache.org/download.html). We used ActiveMQ 5.8.0 Release to build and test our application.

##Steps to Run the Pigeon

  • Run the ActiveMQ service on all the machines:

~path-to-ActiveMQ-bin-dir~$ ./activemq start

  • Run the subscriber on all machines, listening for event notification:

~path-to-pigeon-bin-dir~$ ./pigeon.sh subscribe <topicname> <jobscript> <tcp://host:port>

  • Run the publisher on machine which publishes event notification message:

~path-to-pigeon-bin-dir~$ ./pigeon.sh publish <topicname> <eventmessage> <tcp://host:port>