A minimal data versioning tool in Golang.
TLDR:
- for each data asset keeps a manifest file that can be crawled and stored in a mastro catalogue.
- based on the
commons.abstract.sources
package - multiple connectors available - meant to be runnable either locally or by a workflow manager (e.g. Argo-workflows)
An mvc provider is available for the desired backend storage to be used for file versioning. Mind that a provider is defined as follows:
type MvcProvider interface {
InitConnection(cfg *conf.Config) (MvcProvider, error)
InitDataset(cmd *InitCmd)
NewVersion(cmd *NewCmd)
Add(cmd *AddCmd)
AllVersions(cmd *VersionsCmd)
LatestVersion(cmd *LatestCmd)
OverwriteVersion(cmd *OverwriteCmd)
DeleteVersion(cmd *DeleteCmd)
}
The mvc provider instantiates a mastro connector within the InitConnection
function, as specified in the commons module.
In order for mvc to work, a Mastro configuration file of kind mvc
must be specified and referred to using the MVC_CONFIG
variable, e.g.:
./mvc -h
required key MVC_CONFIG missing value
export MVC_CONFIG=$PWD/conf/example_s3.yml
where example_s3.yml
refers to the public minio:
type: mvc
backend:
name: public-minio-s3
type: s3
settings:
region: us-east-1
endpoint: play.min.io
access-key-id: Q3AM3UQ867SPQQA43P2F
secret-access-key: zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
use-ssl: true
bucket: ""
Let us now list available versions for the path abcde
:
./mvc versions -d abcde
2021/06/10 14:44:58 Successfully loaded config mvc public-minio-s3
2021/06/10 14:44:58 Successfully validated data source definition
2021/06/10 14:44:58 Using provided region us-east-1
[1623324009]
mvc init -d $PATH
- initializes local metadata file (i.e. manifest) for an asset located at $PATHmvc init -d $PATH -f $MANIFESTPATH
- uploads manifest file located at $MANIFESTPATH at $PATH
mvc new -d $PATH
- creates new version and returns full path at $PATHmvc versions -d $PATH
- retrieves all available versions at $PATH and shows their metadatamvc latest -d $PATH
- retrieves latest version at $PATHmvc delete -d $PATH -v $VERSION
- deletes the specified version and updates the metadata
mvc add -l $LOCALPATH -d $PATH
- adds $LOCALPATH to remote $PATH at current latest version, includes the sha256 in the version metadatamvc overwrite -d $PATH -v $VERSION -l $LOCALPATH
- overwrite existing version $VERSION at $PATH and overwrites metadata
mvc check -l $LOCALPATH
- computes the sha256sum of the entire folder at $LOCALPATH