Skip to content
mlimotte edited this page May 10, 2012 · 3 revisions

Lemur accepts a number of options and args specified on the command line. This is distinct from the args (or "job args") that are passed to your hadoop job main-class for each step. These job args are assembled from the key/value pairs in your defstep and the command-line args that are given to lemur.

For the command-line arguments given to lemur, lemur first takes out the options that it recognizes. You can have lemur recognize additional options by putting a (catch-args ...) block in your jobdef. Anything not recognized in this way, goes into the :remaining list. Any function can reference these remaining args through eopts. For example, (fn [eopts] (first (:remaining eopts))). The "remaining" args can also be "passed through" to your Hadoop main.

The complete spec for args includes these options (which can appear in your defstep).

:args.foo "100"
:args.passthrough true
:args.positional ["foo" "bar baz"]
:args.data-uri true

The name portion of :args.foo, i.e. "foo", can be any string. It is passed to your hadoop main as "--foo" "100". There can be zero or more key/value pairs of this type. Any arg specified in this way automatically has it's name portion added to (catch-args) if not already there. The value you specify here will be the default. In this example (catch-args [:foo "foo" "100"]). This way, any arg can be overridden on the command line (e.g. --foo 200). See catch-args in examples/sample-jobdef for more.

Named args with a boolean value are treated specially. For example:

:args.bar true
:args.qux false

In this case, the command line passed to your hadoop job would contain "--bar" with no value and --qux would not be used at all. I.e. these are boolean options and are either present or not present; they are not followed by a value. Like :args.foo above, they are automatically added to (catch-args), but in this case, a '?' is appended to the key name (e.g. :bar? and :qux?), to indicate that they are boolean options.

To turn a boolean value OFF from the lemur command line, you would say --no-bar. The no- prefix on the option means set the option to false. Similarly, using no- on a regular arg, like --no-foo, would set the value of :foo to nil.

Here are a couple of examples, showing the interaction of catch-args, defstep and the actual lemur command line.

(catch-args [:joe "how many joes?" "two"])

(defstep s
  ...
  :bob? false
  :args.joe "one"
  :args.tom nil
  :args.bob true)

$ lemur local the-jobdef.clj --no-bob --tom "one"

This results in:

hadoop jar ... --joe one --tom one

  1. How does --tom get the value "one"? Because :tom does not appear in catch-args, it is added implicitly AND :args.tom is associated with the arg value. Also, in this case, :args.tom nil has no default value. If you wanted a default value, you could set :args.tom "one", and you would still get the implicit --tom override capability.
  2. Why is --joe "one" and not "two"? Unlike #1, :joe does appear in catch-args, so nothing is done implicitly. In particular there is no association made between :joe and :args.joe, so :args.joe gets the value specified in the defstep. In you wanted :joe to behave like :tom, you would set :args.joe "${joe}".
  3. Finally, :args.bob has no catch-args entry, so one is created implicitly. This allows to turn --bob off (which in the hadoop args means that it doesn't appear at all) on the lemur command line with --no-bob.

Using the same catch-args/defstep above:

$ lemur local the-jobdef.clj 

results in

hadoop jar ... --joe one --tom two

  1. As you can see, --bob is still off. Why is that the case when the defstep says :args.bob true. Because :args.bob is a boolean option, the implicit entry in catch args is named :bob? (The ? at the end indicates that it is boolean) and :args.bob is implicitly associated with :bob?. So the entry in the defstep that reads :bob? false is seen, and as an explicit entry, takes precedence over the 'default value'. On the other-hand, specifying --bob on the lemur command line would take precedence over the defstep entry.

NOTE The '?' suffix for boolean options comes from the clojure.contrib/command-line convention for boolean options. Although, I don't use that library, I choose to follow that convention. I understand why that library does that, but I don't think it adds any value here, so I may remove the '?' convention in the future.

The other keys listed at the beginning of this page have special meanings:

:args.passthrough true  -- pass through the "remaining" args from the lemur command line, as explained above.  I.e. the value of (:remaining eopts)
:args.positional [...]  -- specify some positional arguments
:args.data-uri true     -- include the value of the :data-uri key (see "Standard Job Paths" in [[The Jobdef]])

Also note that the order that these options are specified in defstep is relevant. The args are passed to your hadoop main in this order.

In addition, you can specify profiles to override these values, for example in local mode.

:local {:args.foo "1"}
Clone this wiki locally