Skip to content
Tim L edited this page Apr 9, 2015 · 38 revisions

Implementation

git2provConverter.js

This section outlines the key aspects of how MMLab implemented git2prov, which they host at http://git2prov.org and is available on GitHub (which I forked). git2provConverter.js uses PROV-JSON as its native representation.

The following git command finds all files in the repository (done at git2provConverter.js#75):

git --no-pager log --pretty=format: --name-only --diff-filter=A

The following git command uses pretty format strings to get file info (done at git2provConverter.js#107):

  var commitHash = options['shortHashes']?"%h":"%H";
  var parentHash = options['shortHashes']?"%p":"%P";
  ...
  // Next, do a git log for each file to find out about all commits, authors, 
  // the commit parents, and the modification type

  // This will output the following: Commit hash, Parent hash(es), Author name, 
  // Author date, Committer name, Committer date, Subject, name-status

  // This translates to: activity (commit), derivations, agent (author), 
  // starttime, agent (committer), endtime, prov:label (Commit message)
  git --no-pager log --date=iso --name-status --pretty=format: \
    ...currentEntity+','+commitHash+','+parentHash+',%an,%ad,%cn,%cd,%s...

e.g. for the XSL file that converts SVN to PROV-O (grddl.xsl), the git log command above returns:

bash-3.2$  git --no-pager log --date=iso --name-status --pretty=format:"data/source/opendap-org/opendap/src/grddl.xsl,%H,%P,%an,%ad,%cn,%cd,%s,&" -- data/source/opendap-org/open
dap/src/grddl.xsl
data/source/opendap-org/opendap/src/grddl.xsl,645757911635cbaeda7f6e20b9fcd93a9f98fc7a,1866273933e699a1fe084c9c0626738b661ffdf9,Tim L,2014-01-08 14:45:05 -0500,Tim L,2014-01-08 14:45:05 -0500,reworked SVN XML Log -> PROV-O to include prv, nfo vocabs.,&
M	data/source/opendap-org/opendap/src/grddl.xsl

data/source/opendap-org/opendap/src/grddl.xsl,0e68bfccc6ff3311b20105dc3cfad34632bf18f8,b1bd90bbe1886d106eced6883e924596f23b11c7,Tim L,2014-01-07 14:36:33 -0500,Tim L,2014-01-07 14:36:33 -0500,character escaping in svn log grddl,&
M	data/source/opendap-org/opendap/src/grddl.xsl

data/source/opendap-org/opendap/src/grddl.xsl,850a3ab2783610aec0eb22d51abe4b98bd450f5b,65dcaeee85e2cd4da0df4bfebaa4796261219252,Tim L,2013-12-28 14:19:28 -0500,Tim L,2013-12-28 14:19:28 -0500,added wasDerivedFrom with @copyfrom attributes,&
M	data/source/opendap-org/opendap/src/grddl.xsl

data/source/opendap-org/opendap/src/grddl.xsl,65dcaeee85e2cd4da0df4bfebaa4796261219252,0467e2acfb7c915ef2555ab2aad212478d07ce9b,Tim L,2013-12-28 13:55:02 -0500,Tim L,2013-12-28 13:55:02 -0500,opendap svn xml log produces valid ttl,&
M	data/source/opendap-org/opendap/src/grddl.xsl

data/source/opendap-org/opendap/src/grddl.xsl,b80a27317c08be2f8f9407e6c555849a04b615ab,8f55b7d5d36bf6c3a36a0c3f681cbcf19b8d1a04,Tim L,2013-12-27 15:12:55 -0500,Tim L,2013-12-27 15:12:55 -0500,situating opendap svn conversion into SDV,&
M	data/source/opendap-org/opendap/src/grddl.xsl

data/source/opendap-org/opendap/src/grddl.xsl,203ac7ad04e04c2c045f4ff77f1ba8f8ff6571be,fa857627cb173e16c31f99804e128164cc8b8f02,Tim L,2013-12-23 00:00:42 -0500,Tim L,2013-12-23 00:00:42 -0500,stub for transforming SVN Log XML into PROV,&
A	data/source/opendap-org/opendap/src/grddl.xsl

The constant ,& in the format string is used to merge the two lines.

[--] <path>... Show only commits that are enough to explain how the files that match the specified paths came to be. See "History Simplification" below for details and other simplification modes. Paths may need to be prefixed with -- to separate them from options or the revision range, when confusion arises.

Illustration of its modeling in provenanceweb/github/provenanceweb/data/source/github-com-provbench/meta/version/git/manual/git2prov.ttl.graffle

git2prov.sh

I wrote some early thoughts on trying to use git2prov's modeling, how it could be improved.

I'm re-implementing git2provConverter.js

remove blank lines, then combine two lines.

To see the breakdown of the log columns, see git2prov.sh#L59.

Examples

GitHub offers its repositories as Git and SVN, so we can compare the same series of revisions as appear via git2provConverter.js, git2prov, and svn2prov.

Future work

  • Include the md5 of the files?
    • mloberg shows how to search for a file based on MD5.
    • Prizms nodes use a convention for files by hash. It is something like http://localhost/id/file/99495fd61d17ba400b2eb0dd1054cadc
    • DSNameFactory uses MD5 of the file contents and the MD5 of the file path, e.g. http://localhost/id/file/87ab47b0c76dc099ef9a89fda9c599e2/at/9ba649aeabb1953e3b2ee0647f36b84e
    • PRONOM's eparams choose a file URI convention, too.
Clone this wiki locally