Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.
Ed Lewis edited this page Oct 13, 2016 · 8 revisions

Running the samples

Using vagrant

  • Clone this repository - git clone git@github.com:snowplow/factotum.git
  • cd factotum
  • Set up a vagrant box and ssh into it - vagrant up && vagrant ssh
    • This will take a few minutes
  • cd /vagrant
  • Compile and run a demo - cargo run -- run samples/echo.factfile

Using stable rust without vagrant

  • Install rust
    • on Linux/Mac - curl -sSf https://static.rust-lang.org/rustup.sh | sh
  • Clone this repository - git clone git@github.com:snowplow/factotum.git
  • cd factotum
  • Compile and run a demo - cargo run -- run samples/echo.factfile

Creating a job

Factotum factfiles must adhere to this self describing JSON schema which defines jobs in factotum.

Field glossary

Field Description
name A user defined title for the job
tasks An array of the tasks that this job is comprised from. A task is a single task that a job must complete (a single node in the DAG)
tasks/*/name A title for the task
tasks/*/executor A method of execution for the task. This is reserved as "shell" for now
tasks/*/command The command to invoke. For example with the executor "shell" this can be a bash script, or an executable on your path (e.g. echo)
tasks/*/arguments Arguments to pass to your command
tasks/*/dependsOn A list of the tasks (by their name) this task depends on (and so must be executed after). For example, if task B depends on task A, task A will aways be executed before task B
tasks/*/onResult/terminateJobWithSuccess A list of return codes that if returned by this task will cause any running jobs to finish and factotum to stop processing the rest of the job. This is sometimes described as a "no-op".
tasks/*/onResult/continueJob A list of expected return codes for the task. If the task returns a code in this set, the job will continue normally.

How are task return codes handled

tasks/*/onResult/terminateJobWithSuccess and tasks/*/onResult/continueJob are mutually exclusive. If a task returns a code in the list specified by "continueJob" the task is considered a success and the job will continue. If a task returns a code in the list specified by "terminateJobWithSuccess" the job will finish running the jobs it's currently running, and break out of the job early without considering the job a failure. If a return code is encountered that is not present in either "continueJob" or "terminateJobWithSuccess" the task is regarded as erroring, and the job will exit as an error after any other running tasks complete.

Using variables

Factotum supports variables in the majority of field values (for example, for task arguments). Variables are in the form {{ variable_name }} as in this example. Nested variables are also possible, for example the following:

"arguments": "{{ snowplow.message }}"

will replace "arguments" with whatever is defined in the "snowplow" object as "message". If no JSON is supplied the task is assumed to contain no variables.

These variables are given to Factotum via the --env JSON option. In the nested example, we could run Factotum with the following options --env '{"snowplow":{ "message":"hello world" }}' to pass "hello world" as the argument to the task.

Variable substitution works using mustache - any valid mustache is valid for Factotum jobs - providing it appears inside a task's fields (the whole file is not templated).

Starting from an arbitrary point

Factotum 0.2.0+ includes functionality to (re)start a job from a given point, allowing you to skip tasks that have already been run.

This functionality is provided using the "--start" (or "-s") command line option. Given the Factfile below:

{
   "schema":"iglu:com.snowplowanalytics.factotum/factfile/jsonschema/1-0-0",
   "data":{
      "name":"echo order demo",
      "tasks":[
         {
            "name":"echo alpha",
            "executor":"shell",
            "command":"echo",
            "arguments":[
               "alpha"
            ],
            "dependsOn":[

            ],
            "onResult":{
               "terminateJobWithSuccess":[

               ],
               "continueJob":[
                  0
               ]
            }
         },
         {
            "name":"echo beta",
            "executor":"shell",
            "command":"echo",
            "arguments":[
               "beta"
            ],
            "dependsOn":[
               "echo alpha"
            ],
            "onResult":{
               "terminateJobWithSuccess":[

               ],
               "continueJob":[
                  0
               ]
            }
         },
         {
            "name":"echo omega",
            "executor":"shell",
            "command":"echo",
            "arguments":[
               "and omega!"
            ],
            "dependsOn":[
               "echo beta"
            ],
            "onResult":{
               "terminateJobWithSuccess":[

               ],
               "continueJob":[
                  0
               ]
            }
         }
      ]
   }
}

You can start from the "echo beta" task using the following:

$ factotum run samples/echo.factfile --start "echo beta"

Which skips the task "echo alpha", and starts from "echo beta".

In more complicated DAGs, there are some tasks which cannot currently be the starting point for jobs. Resuming a job from such tasks would be ambiguous, typically because the DAG has parallel execution branches and a single start point does not tell Factotum enough about the start state of all of the branches.

This edge case is discussed in https://github.com/snowplow/factotum/issues/54

Logging

A more verbose log is created under .factotum/factotum.log in the working directory in which factotum was invoked. If this directory is not-writable factotum will fail to run.

Webhooks

See here for more information on the webhook format & details used by Factotum.

Current limitations

This list is not exhaustive, but everything listed here will be fixed soon.

  • Tasks cannot "forward reference" each other with dependencies - #31
  • In some cases, factotum will execute jobs with a sub-optimal order - #30