Skip to content

Describe Workflow

MichailAlexakis edited this page Dec 5, 2017 · 5 revisions

Describe workflow - DRAFT

Introduction

A user designs a workflow interacting with a graphical UI. This workflow must be expressed as some kind of graph that describes the relations (dependencies) between steps.

Describe using JSON

An example: A simple interlinking/fusion workflow

In this example, we have a pair of SHP (ESRI shapfile) files, transformed to RDF using triplegeo, interlinked using limes, and finally fused into a single RDF output. An oval node denotes a tool invocation, while a rectangle node denotes a file resource; input resources (identifiable and accessible via a resource catalogue) are colored in lightblue, output resources in orange.

The example is represented as a DOT graph, and can be visualized as:

workflow-1

The above workflow can be described as the following JSON:

{
  // The input(s) for this workflow. Each input resource is referenced by an 
  // application-wide identifier (or wider as a UUID).
  input: [
    {
      name: "1.shp", // an optional filename-like name 
      id: "abc-1"    // the identifier for this resource
    },
    {
      name: "2.shp", 
      id: "abc-2"
    }
  ],

  // Define the flow as a DAG of steps. The dependencies of a step are inferred 
  // by examining the input a step operates on.
  steps: [
    
    // Step#1: Transform SHP into RDF  
    {
      name: "triplegeo-1",     // a workflow-wide identifier for this step
      operation: "transform",  // one of: {transform, interlink, fuse, enrinch}
      tool: "triplegeo",       // one of: {triplegeo, fagi, deer, limes, ...}
      config: { 
        // Provide tool-specific configuration ...  
      },
      // Specify the input for this step. An input can be referenced as:
      //  - "input:<input-id>": an input resource supplied to the workflow
      //  - "result:<step-name>/<step-result-name>": an intermediate result produced into this workflow. 
      input: ["input:abc-1"],
      // Name the (expected) output results from this step. These names can be used to identify
      // intermediate results (that other steps operate on) as <step-result-name>. 
      output: ["rdf-1"]
    },

    // Step#2: Transform SHP into RDF  
    { 
      name: "triplegeo-2",
      operation: "transform",
      tool: "triplegeo",
      config: { /* ... */ },
      input: ["input:abc-2"],
      output: ["rdf-2"]
    },

    // Step#3: Interlink a pair of RDF files
    {
      name: "limes-1",
      operation: "interlink",
      tool: "limes",
      config: { /* ... */ },
      input: ["result:triplegeo-1/rdf-1", "result:triplegeo-2/rdf-2"],
      output: ["rdf-links-1-2"],
    },

    // Step#4: Fuse a pair of RDF files
    {
      name: "fagi-1",
      operation: "fuse",
      tool: "fagi",
      config: { /* ... */ },
      input: ["result:triplegeo-1/rdf-1", "result:triplegeo-2/rdf-2", "result:limes-1/rdf-links-1-2"],
      output: ["rdf-1-2"],
    },

  ],
  
  // The expected output(s) for this workflow. After a successfull workflow execution, each output will
  // register itself to the resource catalogue (to acquire a unique id). The implementation must
  // ensure that metadata related to the (foresaid) workflow execution are also kept along.
  output: {
    "result:fagi-1/rdf-1-2": {
      name: "fused-1-2.rdf", // a human-friendly resourse name
      comment: "The fusion of our pair of SHP files", // A comment on this output
    },
  },
}