Main workflow template for jumpstarting DE projects. Consists of the following files:
- Makefile (
workflow.mk
) to auto-import setups using the below templates - pre-commit config, linters, and PR templates (
.github/
) - Readme and
.gitignore
The DE Workflow template is designed to be the main repository for creating Data Engineering templates. The system is designed using a multi-repo setup where developers can mix and match multiple templates together to fit the needs of their project.
graph LR
DWT[de-workflow-template]
DDT[dwt-dagster-template]
DAT[dwt-airflow-template]
DTT[dwt-terraform-template]
DS[dbt-starter]
DCT[dwt-ci-template]
GS[github-starter]
subgraph Data Orchestrator
DDT
DAT
end
subgraph Infrastructure as Code
DTT
end
subgraph CI/CD
DCT
end
subgraph Data Transformation
DS
end
subgraph Documentation
GS
end
GS--make readme-template-->DWT
DDT--make dagster-->DWT
DAT--make airflow-->DWT
DTT--make gcp-terraform-->DWT
DTT--make aws-terraform-->DWT
DCT--make cloudbuild-->DWT
DCT--make codepipeline-->DWT
DS--make dbt-->DWT
The system revolves around using Makefile
to run scripts that would setup the templates automatically. Ideally, a working project can be created by just typing multiple make
commands that build the template from scratch.
- Create a new repo using this template (or click this link)
- Ensure that these are installed in your system
- direnv - git
- Choose from the available
setup commands
below - Update
README.md
make readme-template -f workflow.mk
- Use the primary make command found for the template you need down bellow.
- Follow the
instructions
on how to initialize the template found at the lower section above each part
Dagster
Note: the commands below will create a dagster/
directory.
# initialize a Dagster setup
make dagster -f workflow.mk
Airflow
Note: the commands below will create a airflow/
directory. (NEED CONFIRMATION - Dev Notes)
# initialize an Airflow setup
make airflow -f workflow.mk
# initialize an Airflow setup w/ DAG Builder
make airflow -f workflow.mk add_dag_builder=1
For further instructions
, go to the Dagster Template Repository or the Airflow Template Repository.
More details on Airflow DAG Builder.
Terraform
Note: the commands below will create a terraform/
directory.
# For GCP setups,
make gcp-terraform -f workflow.mk
# For AWS setups,
make aws-terraform -f workflow.mk
Then, follow terraform/README.md
for the initial Terraform setup.
IMPORTANT NOTE : The Cloud Build and CodePipeline templates need their respective terraform template and your selected orchestrator template to have already been installed.
Note: the commands below will create a ci/
directory and will create/append files in terraform
folder
Cloud Build as CI
# DO THIS FIRST
make gcp-terraform -f workflow.mk
# for Airflow Project
make cloudbuild cloud-platform=gcp orchestrator=airflow -f workflow.mk
# for Dagster Project
make cloudbuild cloud-platform=gcp orchestrator=dagster -f workflow.mk
Then, follow the instructions
found in the Cloud Build README to set up the triggers.
CodePipeline as CI
Note: The CodePipeline template is currently only available for Dagster projects.
# DO THIS FIRST
make aws-terraform -f workflow.mk
# for Dagster Project
make codepipeline cloud-platform=aws orchestrator=dagster -f workflow.mk
Then, follow the instructions
found in the CodePipeline README.
Note: the command below will create a dbt/
directory
make dbt -f workflow.mk
Then, follow the instructions
found in dbt-starter README to set up dbt adapter and environment configurations.
Once done with setting up the project, you can choose to remove the following files from the project directory.
rm workflow.mk
rm terraform.mk
rm ci.mk
rm terraform/README.md
rm -rf terraform/docs/
These are the repositories for the underlying templates used by the De-Workflow-Template.
Data orchestrator templates
Infrastructure as code templates
CI/CD templates
Data transformation templates
Documentation templates