This repository is a copier template which can be used to quickly seed a modern data stack project. Instructions may vary depending on if the repo is hosted via GitHub or Azure DevOps, so we make some distinctions below.
This repo consists of:
- A dbt project.
- pre-commit checks for enforcing code-quality.
- A documentation skeleton using mkdocs-material
- GitHub actions for running quality checks in continuous integration (CI)
Start with a Python environment. Install copier
and poetry
:
python -m pip install copier
Create a directory into which the project will be rendered:
mkdir <your-project-name>
cd <your-project-name>
git init
Create a new project using the copier command-line tool, with...
HTTPS:
copier copy https://github.com/cagov/caldata-infrastructure-template .
OR with SSH:
copier copy git@github.com:cagov/caldata-infrastructure-template.git .
Install git credential manager (with Homebrew if on a Mac, if on a windows you should have it by default with this git instalation.) Then run the following three commands:
brew install git-credential-manager
git remote add origin <Azure DevOps repo url e.g. https://caldata-sandbox@dev.azure.com/caldata-sandbox/mdsa-test/_git/mdsa-test>
copier copy https://github.com/cagov/caldata-infrastructure-template .
This will ask you a series of questions, the answers to which will be used to populate the project.
Once the project is rendered, you should initialize it as a git repository:
git add .
git commit -m "Initial commit"
Finally, install the Python dependencies and commity the poetry.lock
:
poetry install
git add poetry.lock
git commit -m "Add poetry.lock"
For Azure DevOps repos you'll follow the instructions here.
For GitHub repos you'll follow the instructions here.
The projects generated from our infrastructure template need read access to the Snowflake account in order to do two things from GitHub actions:
- Verify that dbt models in branches compile and pass linter checks
- Generate dbt docs upon merge to
main
.
The terraform configurations deployed above create two service accounts for GitHub actions, a production one for docs and a dev one for CI checks.
This repository assumes two service accounts in Snowflake for usage with GitHub Actions.
Set up key pairs for the two GitHub actions service accounts
(GITHUB_ACTIONS_SVC_USER_DEV
and GITHUB_ACTIONS_SVC_USER_PRD
) following the instructions given
here.
In order for the service accounts to be able to connect to your Snowflake account you need to configure secrets in GitHub actions From the repository page, go to "Settings", then to "Secrets and variables", then to "Actions".
Add the following repository secrets:
Variable | Value |
---|---|
SNOWFLAKE_ACCOUNT |
new account locator |
SNOWFLAKE_USER_DEV |
GITHUB_ACTIONS_SVC_USER_DEV |
SNOWFLAKE_USER_PRD |
GITHUB_ACTIONS_SVC_USER_PRD |
SNOWFLAKE_PRIVATE_KEY_DEV |
dev service account private key |
SNOWFLAKE_PRIVATE_KEY_PRD |
prd service account private key |
The repository must have GitHub pages enabled in order for it to deploy and be viewable.
- From the repository page, go to "Settings", then to "Pages".
- Under "GitHub Pages visibility" select "Private" (unless the project is public!).
- Under "Build and deployment" select "Deploy from a branch" and choose "gh-pages" as your branch.
Continuous integration for this template creates a new project from the template,
then verifies that the pre-commit
checks pass.
Future versions might do additional checks
(e.g., running sample dbt models or orchestration DAGs).
To run the tests locally, change directories to the parent of the template, then run
./caldata-infrastructure-template/ci/test.sh