#Provenance within the DukeDS work in progress...
Provenance with DukeDS allows users to create activites and specify relationships among activities and files.
For our purposes an activity is a process that creates or modifies files.
Activities consist of a name, description, started_on and ended_on dates.
Activities have three relationships with files that can be recorded:
- used - a file was used in an activity
- generatedBy - a file was generated by an activity
- invalidatedBy - a file that was invalidated by an activity
There is one file to file relationship possible:
- One file can be derived from another.
The following examples are only meant to illustrate the ideas. The file format may change.
Example Activity: DNA Sequenced creating one result/mouse_192.bam file.
Here is an example of a possible file format that could be included in an upload.
activities:
- name: dna_sequenced
description: DNA sequenced on MinION...
started_on: 2016-08-17 12:20
ended_on: 2016-08-17 14:47
file:
name: result/mouse_192.bam
generatedBy: dna_sequenced
Example Activity: Running fastqc on result/mouse_192.bam file creating result/fastqc_report.html
activities:
- name: fastqc
description: Running fastqc on...
started_on: 2016-08-17 12:20
ended_on: 2016-08-17 14:47
files:
- name: result/mouse_192.bam
usedBy: fastqc
- name: result/fastqc_report.html
generatedBy: fastqc
derivedFrom: result/mouse_192.bam
The idea is that these example provenance files could be included with an upload or applied after the fact. This is subject to change before provenance is implemented in DukeDSClient.
DukeDS can infer some relationships between the users, tools(like ddsclient), projects, and files. These are all based on who uploaded files for a project.
- project wasAssociatedWith user
- fileVersion wasAttributedTo user
- fileVersion wasAttributedTo ddsclient