Skip to content

Reporting provenance in a Domain Specific Language

Ashish Gehani edited this page Feb 11, 2016 · 5 revisions

This reporter creates a named pipe in the filesystem, to which a user (or external application) can send provenance information in a simple domain specific language (DSL) that we describe below.


To send provenance metadata to the SPADE server through the DSL reporter, each vertex or edge must be described on a new line.

Sending a provenance vertex

To send a vertex, write a line of the below form to the pipe:

type:<Agent|Process|Artifact> id:<unique identifier> <key>:<value> ... <key>:<value>

Depending on whether the value associated with the key type is Agent, Process, or Artifact, a corresponding Open Provenance Model vertex will be created. The unique identifier is used to disambiguate a vertex so that it can be referred to as the endpoint of an edge. Each : pair (of which there can be an arbitrary number) is turned into an annotation on the vertex.

(Note that the type:<Agent|Process|Artifact> and id:<unique identifier> elements are only used to tell the DSL reporter what vertices / edges to create, and are not committed to storage.)


For example, the line below can be entered on the command line of a Unix shell to report that a program named firefox ran with a PID of 1234:

echo type:Process id:1 program:firefox pid:1234 >> /tmp/spade_pipe

Similarly, a provenance vertex can be created to describe that a data artifact has filename index.html and is owned by user:

echo type:Artifact id:2 filename:index.html owner:user >> /tmp/spade_pipe

Sending a provenance edge

To send an edge, write a line of the below form to the pipe:

type:<Used|WasGeneratedBy|WasTriggeredBy|WasDerivedFrom|WasControlledBy> from:<unique identifier> to:<unique identifier> <key>:<value> ... <key>:<value>

The value associated with the from key determines which vertex is the source of the edge, while the value associated with the to key determines which vertex is the destination of the edge.

If the key type is Used, the key from must be associated with a value that is the unique identifier of a Process vertex, and the key to must be associated with a value that is the unique identifier of an Artifact vertex. If the key type is WasGeneratedBy, the key from must be associated with a value that is the unique identifier of an Artifact vertex, and the key to must be associated with a value that is the unique identifier of an Process vertex. If the key type is WasTriggeredBy, the key from must be associated with a value that is the unique identifier of a Process vertex, and the key to must be associated with a value that is the unique identifier of an Process vertex. If the key type is WasDerivedFrom, the key from must be associated with a value that is the unique identifier of an Artifact vertex, and the key to must be associated with a value that is the unique identifier of an Artifact vertex. If the key type is WasControlledBy, the key from must be associated with a value that is the unique identifier of a Process vertex, and the key to must be associated with a value that is the unique identifier of an Agent vertex.

All the remaining <key>:<value> pairs are turned into annotations on the edge.


Continuing the example above, the fact that the firefox process read the index.html file at 4:20 am can be reported with a Used edge:

echo type:Used from:1 to:2 time:0420 >> /tmp/spade_pipe

Note that the keys type, id, from, and to are reserved and cannot be used as annotation keys. Spaces (' ') and colons (:) can be used in the remaining keys and in all values by escaping them using a backslash (\). For example, instead of the value 0420, a value 4:20 am can be sent by writing it as 4\:20\ am.

To record provenance from within an application, simply print or write the information to the named pipe.


Configuring DSL reporting

The DSL reporter takes a single argument, which is the location in the filesystem where the pipe is to be created. Note that this must be done in the SPADE controller (after the SPADE server has been started):

-> add reporter DSL /tmp/spade_pipe
Adding reporter DSL... done

As long as no other object existed at that location (such as a file, directory, socket, or pipe), the DSL reporter will create a named pipe at that location (which is /tmp/spade_pipe in the above example).

The provenance metadata being written to the named pipe created by the DSL reporter will no longer be sent to the SPADE kernel after this:

-> remove reporter DSL
Shutting down reporter DSL... done
Clone this wiki locally