Skip to content
mlimotte edited this page May 10, 2012 · 2 revisions

Validators are a set of functions that are run before your job is started to validate the command line options, environment or anything else you like. They are added in your jobdef or base file with (add-validators ...).

Each function is either a 0-arg function or a 1-arg function. In the latter case, eopts is passed in. eopts includes all your values in defcluster, the command line options, defaults, values from bases, etc). The function returns a String, vector of Strings or false on failure (the strings will be output as the failure message). On success the fn should return nil or true or '().

You can write your own functions for arbitrary checks (consider using the helper functions: lemur.core/lfn, and/or lemur.common/mk-validator). However, for many common cases, you can use lemur.common/val-opts and lemur.common/val-remaining; which provide a declarative method for specifying validations.

RECOMMENDED defining validators can save time by avoiding a cluster launch that fails because of missing/bad options. In particular, remember to write a validator to check :remaining. Any options that are not caught are left in remaining, so if someone mis-types an option it could show up here.

Here's an example:

(add-validators
  (lfn [dataset]
    (if-not
      (contains? #{"ahps" "stage_iv"} dataset)
      "--dataset must be specified as 'ahps' or 'stage_iv'"))
  (val-opts :file :days-file)
  (val-opts :required :numeric :num-days)
  (val-remaining :empty true "Unknown arguments"))

val-opts

(val-opts options+ [keyword-name*] err-msg?)

options is one or more of:

  :required - this option is required
  :numeric - the option, if it exists, must be numeric (float or int)
  :word - the option, if it exists, must contain only word characters (alpha-numeric, _, -)
  :file - the option, if it exists, must be an existing file (local or S3)
  :dir - the option, if it exists, must be an existing dir (local or S3)
  :file-or-dir - the option, if it exists, must be an existing file/dir (local or S3)
  :local-dir - the option, if it exists, must be an existing local directory

A single keyword-name, or a collection of keyword-names follows the options. These are the keywords in eopts that you want to validate.

err-msg is an optional String with a custom error message to report on failure. If err-msg is not given, than a suitable message is constructed based on the check that failed.

Examples

  ; The :keypair option is required. In all cases, :required can be satisified
  ; either by an entry in the jobdef or with a --option on the command-line
  (val-opts :required [:keypair])

  ; :scripts-src-path is required, and must be an existing file/directory
  (val-opts :dir :required [:scripts-src-path])

  ; :some-optional-path is NOT required, but if specified, it must be
  ; an existing file/directory
  (val-opts :file-or-dir [:some-optional-path])

  ; Both :app and :foo are required, and must contain only word-characters
  (val-opts :required :word [:app :foo])"

val-remaining

Validates the remaining args (i.e. those not caught by lemur or your jobdef via catch-args).

(val-remaining mini-cmd-spec pred* err-msg?)

Specify one or more predicates from the following:

  :min N - should contain at least N entries
  :max M - should contain at most M entries
  :empty true - should contain exactly 0 entries
  :required coll - the collection lists required options in --opt form (requires
                   that :mini-spec, described below, be specified)

mini-cmd-spec is :mini-spec [...] -- The mini command spec, if it exists, is applied before the checks above, so that those checks are only validating what is 'left-over'. It's value is a collection of keywords naming options that are understood by your hadoop main-class, boolean options should be indicated with a ? at the end (e.g. :bar?).

err-msg is an optional String with a custom error message to report on failure. If err-msg is not given, than a suitable message is constructed based on the check that failed.

Examples

  ; There are at least 2, but no more than 5 remaining args
  (val-remaining :min 2 :max 5)

  ; The remaining args may optionally contain '--foo value' and '--bar' (the latter
  ; with no trailing value, since it is identified as a boolen option). Disregading
  ; these options, the rest of remaining should be empty.
  (val-remaining :mini-spec [:foo :bar?] :empty true)

  ; Like the previous example a number of options that may be specified, and the
  ; rest should be empty.  But it also specifies that option '--foo value' is required.
  (val-remaining :mini-spec [:foo :bar? :baz] :required [:foo] :empty true)"