BigQuery pipeline is an implementation of an end-to-end batch data pipeline which runs in a weekly manner. It ingests the latest weather forecasts data from MetaWeather API, loads that data to Google BigQuery and finally produces weather insights via a dashboard.
The pipeline was built using Python; Pandas; BigQuery API; Heroku CLI; Google Data Studio
Data pipeline ↗️
- Extract: call location endpoint to get the current weather of a city and 5 days forecast
- Transform: do some transformations like: renaming columns, changing data types, and generate a new date dimension table from the
date
column - Load: load the final weather data and the generated dimension to bigquery
- Historical data: get history of weaher forecasts using location day endpoint between two selected dates
- API cities: get the cities available to query from MetaWeather API using loaction search endpoint
- Logging: implement simple custom logger using loguru libirary
- Scheduling: run the pipeline in a weekly manner as the MetaWeather API provides forecasts for 6 days interval
First, you should setup your google service account permission and create the required tables using BigQuery UI
Then clone this repo and run the pipeline
pip install -r requirements.txt
python main.py