-
Notifications
You must be signed in to change notification settings - Fork 3
Describe Workflow
MichailAlexakis edited this page Dec 5, 2017
·
5 revisions
A user designs a workflow interacting with a graphical UI. This workflow must be expressed as some kind of graph that describes the relations (dependencies) between steps.
In this example, we have a pair of SHP
(ESRI shapfile) files, transformed to RDF
using triplegeo
, interlinked using limes
, and finally fused into a single RDF
output. An oval node denotes a tool invocation, while a rectangle node denotes a file resource; input resources (identifiable and accessible via a resource catalogue) are colored in lightblue, output resources in orange.
The example is represented as a DOT graph, and can be visualized as:
The above workflow can be described as the following JSON
:
{
// The input(s) for this workflow. Each input resource is referenced by an
// application-wide identifier (or wider as a UUID).
input: [
{
name: "1.shp", // an optional filename-like name
id: "abc-1" // the identifier for this resource
},
{
name: "2.shp",
id: "abc-2"
}
],
// Define the flow as a DAG of steps. The dependencies of a step are inferred
// by examining the input a step operates on.
steps: [
// Step#1: Transform SHP into RDF
{
name: "triplegeo-1", // a workflow-wide identifier for this step
operation: "transform", // one of: {transform, interlink, fuse, enrinch}
tool: "triplegeo", // one of: {triplegeo, fagi, deer, limes, ...}
config: {
// Provide tool-specific configuration ...
},
// Specify the input for this step. An input can be referenced as:
// - "input:<input-id>": an input resource supplied to the workflow
// - "result:<step-name>/<step-result-name>": an intermediate result produced into this workflow.
input: ["input:abc-1"],
// Name the (expected) output results from this step. These names can be used to identify
// intermediate results (that other steps operate on) as <step-result-name>.
output: ["rdf-1"]
},
// Step#2: Transform SHP into RDF
{
name: "triplegeo-2",
operation: "transform",
tool: "triplegeo",
config: { /* ... */ },
input: ["input:abc-2"],
output: ["rdf-2"]
},
// Step#3: Interlink a pair of RDF files
{
name: "limes-1",
operation: "interlink",
tool: "limes",
config: { /* ... */ },
input: ["result:triplegeo-1/rdf-1", "result:triplegeo-2/rdf-2"],
output: ["rdf-links-1-2"],
},
// Step#4: Fuse a pair of RDF files
{
name: "fagi-1",
operation: "fuse",
tool: "fagi",
config: { /* ... */ },
input: ["result:triplegeo-1/rdf-1", "result:triplegeo-2/rdf-2", "result:limes-1/rdf-links-1-2"],
output: ["rdf-1-2"],
},
],
// The expected output(s) for this workflow. After a successfull workflow execution, each output will
// register itself to the resource catalogue (to acquire a unique id). The implementation must
// ensure that metadata related to the (foresaid) workflow execution are also kept along.
output: {
"result:fagi-1/rdf-1-2": {
name: "fused-1-2.rdf", // a human-friendly resourse name
comment: "The fusion of our pair of SHP files", // A comment on this output
},
},
}