Skip to content
Marc Limotte edited this page May 8, 2013 · 9 revisions

Overview

Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions (aka hooks) and zero or more "steps". A step is Amazon's name for a task or job submitted to the cluster. Lemur reads your jobdef, at the end of your jobdef, you execute (fire! ...) to make things happen. Also keep in mind that the jobdef is an interpreted clj file, so you can insert arbitrary Clojure code to be executed anywhere in the file (but see Jobdef Hooks for a better way).

Articles

Start Here Marc's blog announcement

A sample jobdef using Hadoop Streaming

Documentation