Contains dbt models for transformations in the DWH
- Make sure you have python installed: Python 3.8 and above. You can get this from https://www.python.org/downloads/release/python-380/
- Be sure to check this: https://docs.getdbt.com/faqs/Core/install-python-compatibility matrix to see the compatibility of various python versions with the available dbt versions
- For Windows preferably use Git Bash as your terminal. Download the git package here https://gitforwindows.org/ (It will include git bash)
- Make sure you have a PostgreSQL client to interact with your PostgreSQL database e.g PgAdmin, DBeaver, Azure Data Studio etc
-
Clone the repo from GitHub and cd to the root folder`
-
Create a python virtual environment by running:
python -m venv <name_of_environemt>
(e.g.python venv venv
) -
Activate virtual environment by running:
source venv/Scripts/activate
-
Once virtual environment is activated make sure you are using the latest pip by running:
python -m pip install --upgrade pip
-
Install dbt adapter for postgres by running:
pip install psycopg2-binary==2.9.6
and thenpython -m pip install dbt-core dbt-postgres
- Make sure that the libraries install succesfully without any errors
-
After installing run version check to confirm dbt is installed in your virtual environment
dbt --version
-
Create a
.env
file on the root folder and paste the following environment variables (make sure there is no space between):export DBT_USER_DEV=<postgres user> export DBT_PASSWORD_DEV=<postgres password> export DBT_DATABASE_DEV=<database to build models on> export DBT_SERVER_DEV=<server ip address> export DBT_SCHEMA=<default schema to build models on> export DBT_PROFILES_DIR=./profiles/
For DBT_SCHEMA make sure you have a schema in the development Postgres instance that you will use to build your models & datasets. Ideally call it dbt_<name_of_dev>
-
Run
source .env
to load your environment variables. -
Make sure you have the config file
profiles.yml
inside the profiles folder with the following configarations:
palladium_kenya_analytics:
target: dev
outputs:
dev:
type: postgres
host: "{{ env_var('DBT_SERVER_DEV') }}"
database: "{{ env_var('DBT_DATABASE_DEV') }}"
schema: "{{ env_var('DBT_SCHEMA')}}"
port: "{{ env_var('DBT_PORT_DEV') | int }}"
user: "{{env_var('DBT_USER_DEV')}}"
password: "{{ env_var('DBT_PASSWORD_DEV') }}"
threads: 4
dbt deps
- pulls the most recent version of the dependencies listed in your packages.yml from gitdbt compile
- generates executable SQL from sourcedbt run
- runs all models in the models folderdbt run --select <model_name>
- runs a specified single model e.gdbt run --select stg_healthcare_data
dbt run --select <path/to/my/models>
- runs all models in a specified directory e.gdbt run --select dimensional_layer
dbt seed
- loads csv files (typically not for large files)dbt test
- runs tests against your models and seedsdbt build
- builds models, tests, seeds and snapshots based on dependanciesdbt docs generate
- generates your project's documentationdbt docs serve
- starts a webserver on port 8000 to serve your documentation locally