Skip to content

Latest commit

 

History

History
71 lines (60 loc) · 3.62 KB

README.md

File metadata and controls

71 lines (60 loc) · 3.62 KB

dbt-palladium-kenya

Contains dbt models for transformations in the DWH

Setting up locally

Requirements

  1. Make sure you have python installed: Python 3.8 and above. You can get this from https://www.python.org/downloads/release/python-380/. Recommended you install with the 'Install for all users' option
  2. For Windows preferably use Git Bash as your terminal. Download the git package here https://gitforwindows.org/ (It will include git bash)
  3. Make sure you have a PostgreSQL client to interact with your PostgreSQL database e.g PgAdmin, DBeaver, Azure Data Studio etc

Steps

  • Clone the repo from GitHub and cd to the root folder`

  • Create a python virtual environment by running: python -m venv <name_of_environemt> (e.g. python -m venv venv)

  • Activate virtual environment by running: source venv/Scripts/activate

  • Once virtual environment is activated make sure you are using the latest pip by running: python -m pip install --upgrade pip

  • Install dbt adapter for postgres by running: pip install psycopg2-binary==2.9.6 and then python -m pip install dbt-core dbt-postgres

    • Make sure that the libraries install succesfully without any errors
  • After installing run version check to confirm dbt is installed in your virtual environment dbt --version

  • Create a .env file on the root folder and paste the following environment variables. Replace the < strings > with the actual values(make sure there is no space between the equal sign):

    export DBT_USER_DEV=<postgres user>
    export DBT_PASSWORD_DEV=<postgres password>
    export DBT_DATABASE_DEV=<database to build models on>
    export DBT_SERVER_DEV=<server ip address>
    export DBT_SCHEMA=<default schema to build models on>
    export DBT_PORT_DEV=<port number>
    export DBT_PROFILES_DIR=./profiles/
    

    For DBT_SCHEMA make sure you have a schema in the development PostgreSQL instance that you will use to build your models & datasets. Ideally call it dbt_<name_of_dev>

  • Run source .env to load your environment variables.

  • Make sure you have a config file profiles.yml inside the profiles folder with the following configarations:

palladium_kenya_analytics:
  target: dev
  outputs:
    dev:
      type: postgres
      host: "{{ env_var('DBT_SERVER_DEV') }}"
      database: "{{ env_var('DBT_DATABASE_DEV') }}"
      schema: "{{ env_var('DBT_SCHEMA')}}"
      port: "{{ env_var('DBT_PORT_DEV') | int }}"
      user:  "{{env_var('DBT_USER_DEV')}}"
      password: "{{ env_var('DBT_PASSWORD_DEV') }}"
      threads: 4

Common commands to interact with dbt

  • dbt deps - pulls the most recent version of the dependencies listed in your packages.yml from git
  • dbt compile - generates executable SQL from source
  • dbt run - runs all models in the models folder
  • dbt run --select <model_name> - runs a specified single model e.g dbt run --select stg_healthcare_data
  • dbt run --select <path/to/my/models> - runs all models in a specified directory e.g dbt run --select dimensional_layer
  • dbt seed - loads csv files (typically not for large files)
  • dbt test - runs tests against your models and seeds
  • dbt build - builds models, tests, seeds and snapshots based on dependancies
  • dbt docs generate - generates your project's documentation
  • dbt docs serve - starts a webserver on port 8000 to serve your documentation locally

For more info on commands see here: https://docs.getdbt.com/reference/dbt-commands