This project is designed to extract, transform, and load (ETL) air quality data into a Supabase PostgreSQL database. The data is processed to provide insights into air quality across various cities. This pipeline is built using Python for extraction, data transformation, and automation tools, and Supabase PostgreSQL is used as the backend database.
- Data Extraction: Fetches air quality data from external APIs.
- Data Transformation: Cleans and transforms the data to match database schema requirements.
- Database Integration: Loads processed data into Supabase PostgreSQL.
- Automation: The pipeline can be scheduled to run periodically to ensure up-to-date data ingestion.
- Language: Python
- Database: Supabase PostgreSQL
- APIs: Air Visuals
- Automation: Airflow (optional), Cron jobs, or custom scheduling scripts.
Prerequisites Ensure you have the following installed:
- Python 3.x
- Supabase Account with PostgreSQL Database
- API key for air quality data AirVisuals
- Sign in to the IQAir website and go to the dashboard to create an API key for the Community plan.
- Write the extraction code using the AirVisuals API documentation.
- First, get all the states and then cities data in JSON format to determine which states and cities have available data.
- Remove cities from states where the city response fails.
- After running this code, you will have all state and city names in JSON format.
- Create a `constants.py` file to get the environment variables, i.e., API_KEY and Database URI.
- API_KEY
- DATABASE_URI for PostgreSQL