Skip to content

Nikunjmistry22/Air-Quality-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Air Quality Pipeline

This project is designed to extract, transform, and load (ETL) air quality data into a Supabase PostgreSQL database. The data is processed to provide insights into air quality across various cities. This pipeline is built using Python for extraction, data transformation, and automation tools, and Supabase PostgreSQL is used as the backend database.

Features

  • Data Extraction: Fetches air quality data from external APIs.
  • Data Transformation: Cleans and transforms the data to match database schema requirements.
  • Database Integration: Loads processed data into Supabase PostgreSQL.
  • Automation: The pipeline can be scheduled to run periodically to ensure up-to-date data ingestion.

Tech Stack

  • Language: Python
  • Database: Supabase PostgreSQL
  • APIs: Air Visuals
  • Automation: Airflow (optional), Cron jobs, or custom scheduling scripts.

Setup Guide

Prerequisites Ensure you have the following installed:

  • Python 3.x
  • Supabase Account with PostgreSQL Database
  • API key for air quality data AirVisuals

High-Level Architecture

image

Low-Level Design

  1. Sign in to the IQAir website and go to the dashboard to create an API key for the Community plan.
  2. Write the extraction code using the AirVisuals API documentation.
  3. First, get all the states and then cities data in JSON format to determine which states and cities have available data.
  4. Remove cities from states where the city response fails.
  5. After running this code, you will have all state and city names in JSON format.
  6. Create a `constants.py` file to get the environment variables, i.e., API_KEY and Database URI.

First, create a .env file that contains two main parameters:

  • API_KEY
  • DATABASE_URI for PostgreSQL

Run the `states.py` file and store the result as "test_data.json"

This will fetch all the city data where air quality data centers are present.

Run the `main.py` file to execute your pipeline.

To orchestrate, you can use any orchestration tools for daily change captures.

Dimensions & Facts Tables

image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages