Skip to content

Data Engineering project investigates the relationship between suicide rates and the availability of mental health services (number of professionals and beds for care) in conjunction with GDP.

Notifications You must be signed in to change notification settings

abakumova/data-engineering-group-project

Repository files navigation

Data Engineering (LTAT.02.007)

Suicide Rate Investigation Project

This project investigates the relationship between suicide rates and the availability of mental health services (number of professionals and beds for care) in conjunction with GDP.

Project Overview

  • Topic: Analysis of the potential correlation between mental health service availability (number of professionals and beds) and GDP on suicide rates.
  • Goal: To understand if and how mental health infrastructure and economic factors influence suicide rates.

Prerequisites

  • Docker: Ensure Docker is installed on your system.
  • Docker Compose: Ensure Docker Compose is installed and available in your PATH.

Setup Instructions

  1. Clone the Repository
  2. docker-compose up airflow-init
  3. docker-compose up -d
  4. Access the Airflow Web Interface http://localhost:8080

Use the following default credentials to log in:

  • Username: airflow
  • Password: airflow
  1. Access the Minio Interface http://localhost:9001/login

Use the following default credentials to log in:

  • Username: minioadmin
  • Password: minioadmin

//Create bucket warehouse

  1. Access the Streamlit + Geopandas Interface http://localhost:8501

  2. LLM Query Interface for Data Engineering Course http://localhost:8009/

  • Use your OpenAI API KEY (llm\app.py - openai.api_key)
  • cd llm
  • uvicorn app:app --port 8089

Datasets

  1. Suicide Rate

  2. Beds for Mental Health

  3. Human Resources for Mental Health

  4. GDP Data

  5. Mental Health

    • Mental illnesses prevalence: Dataset

Research Questions

  • Does the number of beds for mental health patients have a positive effect on suicide rates?
  • Is there a correlation between GDP and mental health service availability?
  • Does the number of mental health professionals influence suicide rates? Are some roles more impactful than others?
  • Further exploratory questions may arise during data analysis.

Technology Stack

  • Data Orchestration: Apache Airflow
  • Database: DuckDB
  • Data Transformation: dbt
  • Versioning: Apache Iceberg
  • Visualization: Streamlit (for dashboards) and GeoPandas (for geospatial visualizations)
  • Additional: LLM - ask questions that will be trasformed into an SQL query and executed in DuckDB with data from datasets example

Star Schema

Below is our star schema diagram:

Star Schema

UML Data Schema link

About

Data Engineering project investigates the relationship between suicide rates and the availability of mental health services (number of professionals and beds for care) in conjunction with GDP.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •