Documents data processing and calculations as prescribed by Verra methodologies
This repo assumes the use of conda for simplicity in installing GDAL.
- Python 3.9
- make
- conda
Run this the very first time you are setting-up the project on a machine to set-up a local Python environment for this project.
- Install miniconda for your environment if you don't have it yet.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
- Create a local conda env and activate it. This will create a conda env folder in your project directory.
make conda-env
conda activate ./env
- Run the one-time set-up make command.
make setup
Pre-commit is necessary to run linters and generate .py files from .ipynb notebooks whenever you perform git commit
- Install pre-commit
- Run
pre-commit install
to set up the git hook scripts - Verify if pre-commit runs after committing in git
To run automated tests, simply run make test
.
Over the course of development, you will likely introduce new library dependencies. This repo uses pip-tools to manage the python dependencies.
There are two main files involved:
requirements.in
- contains high level requirements; this is what we should edit when adding/removing librariesrequirements.txt
- contains exact list of python libraries (including depdenencies of the main libraries) your environment needs to follow to run the repo code; compiled fromrequirements.in
When you add new python libs, please do the ff:
-
Add the library to the
requirements.in
file. You may optionally pin the version if you need a particular version of the library. -
Run
make requirements
to compile a new version of therequirements.txt
file and update your python env. -
Commit both the
requirements.in
andrequirements.txt
files so other devs can get the updated list of project requirements.
Note: When you are the one updating your python env to follow library changes from other devs (reflected through an updated requirements.txt
file), simply run pip-sync requirements.txt
Outline the necessary file structure before running the notebooks. You can create a file tree here.
Within your local copy of the copied template, create the data/
folder with the following structure prior to running the notebooks (change datasetX
and fileX
as applicable):
data/
├── 02_dataset_alignment/
│ ├── dataset1/
│ │ └── file1
│ ├── dataset2/
│ │ └── file2
└── 03_analytics/
Mention file sources.
Within the project GCP (add link), make sure the following folders and files are present (change structure as applicable):
project-gcp/
├── 02_dataset1/
├── 02_dataset2/
└── 03_analytics/
Mention table sources if any are needed prior to running the notebooks.