Skip to content

Data analysis in a fictiction health plan database using SQL, with 20 scripts focusing in KPIs and operation indicators.

Notifications You must be signed in to change notification settings

gioferracini/Health-Plan-Data-Analysis-using-SQL

Repository files navigation

šŸ„ Brazilian Health Plan — SQL Analytics Project

šŸ“Š Overview

This project simulates a Brazilian Health Plan environment, analyzing key metrics through 20 SQL queries. The goal is to derive actionable insights from a fictional database created using Python with synthetic data generation. All data are fictitious and intended for portfolio and learning purposes only.


🧠 Key Skills Demonstrated

  • SQL (CTEs, joins, aggregations, ranking functions)
  • Data modeling & healthcare KPIs
  • Analytical thinking and storytelling
  • Realistic database simulation with Brazilian health plan structure

šŸ—‚ļø Database Structure

The project uses the following core tables (in .csv format):

  • beneficiarios.csv – Health plan beneficiaries
  • planos.csv – Health plans
  • prestadores.csv – Healthcare providers
  • atendimentos1.csv, atendimentos2.csv, atendimentos3.csv – Patient visits (split due to file size)

The full database simulates records from 2007 to 2024, following realistic patterns of a Brazilian healthcare operator.


šŸ“Œ Analytical Scripts

Grouped by topic for better clarity:

šŸ“ˆ Evolution & Volume

  • Evolution of active beneficiaries by year
  • Annual evolution of appointments
  • Total billing per year
  • Most common procedures performed

šŸ‘„ Beneficiary Profile

  • Distribution by age group
  • Distribution by gender (volume and cost)
  • Distribution by city and state
  • Beneficiaries with the highest cost
  • Average appointments per beneficiary
  • Average tenure of beneficiaries in the plan

šŸ’¼ Plan & Provider Performance

  • Count of beneficiaries per health plan
  • Cost comparison across plans (volume and value)
  • Ticket average per appointment
  • Providers with highest appointment volume
  • Top specialties by volume
  • Top providers by specialty
  • Reutilization by provider
  • Reutilization by specialty

šŸ“† Temporal Analysis

  • Appointment frequency by day of week
  • Average length of stay in the plan (in days, weeks, months, years)

šŸ—ƒļø Files

/1 - Evolution and Volume of Assistance – 5 SQL scripts (.sql format)

/2 - Profile and Behavior of Beneficiaries – 6 SQL scripts (.sql format)

/3 - Performance and Production of Providers – 5 SQL scripts (.sql format)

/4 - Financial and Plans – 3 SQL scripts (.sql format)

/5 - Administrative Information – 1 SQL script (.sql format)

/Creating Database – Database tables in CSV

README.md – Project documentation


šŸš€ How to Use

You can import the .csv files into any relational database (like MySQL, PostgreSQL, SQLite) and execute the scripts directly.


šŸ“Œ Important Notes

This dataset was entirely generated with Python and does not reflect real patient data.

Some scripts use CTEs (WITH), window functions like ROW_NUMBER(), and date functions like DATEDIFF().


šŸ“¬ Contact


⭐ If you found this analysis useful or insightful, feel free to star the repository!

About

Data analysis in a fictiction health plan database using SQL, with 20 scripts focusing in KPIs and operation indicators.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published