- Build the pipline
- Push the results
- Pull and reproduce
dvc run -d src/download_data.py -o data/raw/store47-2016.csv python src/download_data.py
dvc run -d data/raw/store47-2016.csv -d src/splitter.py -o data/splitter/train.csv -o data/splitter/validation.csv python src/splitter.py
dvc run -d data/splitter/train.csv -d data/splitter/validation.csv -d src/decision_tree.py -o data/decision_tree/model.pkl -M results/score.txt python src/decision_tree.py
First, push code changes to Github as usual, for instance:
git commit -am "Change model to be more awesome"
git push origin master
Next, push your dvc files to the cloud:
dvc push
That's it! Now anyone with access can fetch this repository and use dvc to replicate and build on your work.
First, clone/pull this git repo.
git pull origin master --rebase
Next, pull from the cloud with dvc:
dvc pull
Finally, to reproduce the entire pipeline, simply run:
dvc repro model.pkl.dvc
Here, model.pkl.dvc
is the last output in the dvc pipeline. Running it will reproduce all steps.
If you want to change the model, for example, edit the decision_tree.py
file as you see fit. Then, you should be able to re-execute the model simply by re-running the pipeline using dvc repro model.pkl.dvc
.
Once the model has been trained
docker build . -t ci-workshop
docker run -d -p 5005:5005 ci-workshop
You can view the app at http://localhost:5005
Note: try to assign 8G memory and 2CPU in Docker when running the docker build
docker pull TBD
docker run -d -p 5005:5005 TBD
Push changes to the repository