The Covid 19 Prediction Model is a comprehensive tool designed to predict the spread and impact of Covid-19 using historical data and advanced statistical techniques. The model leverages multiple data sources, including Covid-19 case data and mobility data, to provide accurate forecasts and insights into the pandemic's trends. The project aims to assist policymakers, healthcare professionals, and the general public in understanding and responding to the ongoing Covid-19 crisis.
- Python: Core programming language for data processing and model training.
- C++: For efficient data processing and handling large datasets.
- SQLite: Database management for storing and querying data.
- Pandas: Data manipulation and analysis.
- Scikit-learn: Machine learning library for building predictive models.
- Matplotlib & Seaborn: Data visualization.
- CMake: Cross-platform build system.
- Google Test: Unit testing framework for C++.
- Dill: For model serialization in Python.
- Jupyter Notebook: For interactive data analysis and visualization.
- Fetch and preprocess Covid-19 and mobility data from multiple sources.
- Integrate and clean data, ensuring consistency and accuracy.
- Create various date-based, lag, and rolling average features to enhance model performance.
- Train and evaluate machine learning models to predict new Covid-19 cases.
- Visualize actual vs. predicted cases, residuals, and other key metrics to interpret model performance.
- Generate detailed reports and visualizations for data exploration and model results.
- Support for user-defined country data extraction and analysis.
- Ensure you have
git
installed for cloning repositories. - Ensure you have CMake installed and added to your system's PATH.
-
Clone the Repository:
git clone https://github.com/yourusername/Covid19_Prediction_Model.git cd Covid19_Prediction_Model
-
Install CMake:
- Download CMake from here
- Add the CMake binary path (e.g.,
C:\Program Files\CMake\bin
) to your environment variables.
-
Clone SQLiteCpp:
cd external git clone https://github.com/SRombauts/SQLiteCpp.git
-
Modify SQLiteCpp CMakeLists.txt:
- Open
CMakeLists.txt
in theexternal/SQLiteCpp
folder. - Change line 388 from:
to:
option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." ON)
option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." OFF)
- Open
-
Build the Project:
cd .. mkdir build cd build cmake .. cmake --build . --config Release
-
Run the Application:
cd Release Covid19_Prediction.exe
Install the required Python libraries:
pip install pandas numpy scikit-learn sqlite3 matplotlib seaborn dill joblib notebook
- Fetch Data
python scripts/fetch_data.py
This script fetches COVID-19 and mobility data. Note that this may take up to 10-20 minutes.
- Migrate Data
python scripts/migrate_data.py
This script migrates COVID-19 and mobility data for a specified country from the raw datasets to processed CSV files.
- Build the project
cd ..
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd Release
Covid19_Prediction.exe
Follow these steps to configure, build, and run the C++ project.
- Process Data
python scripts/data_processing.py
This script processes the COVID-19 and mobility data for a specific country provided by the user.
- Perform EDA
python scripts/eda_visualization.py
This script performs Exploratory Data Analysis on the processed data.
- Feature Engineering
python scripts/feature_engineering.py
This script performs feature engineering on the processed data.
- Split Data
python scripts/split_data.py
This script splits the data into training and testing sets.
- Model Training
python scripts/model_training.py
This script trains the machine learning model.
- Model Evaluation
python scripts/model_evaluation.py
This script evaluates the performance of the trained model.
- Interpret Predictions
cd notebooks
jupyter notebook
Open interpret_predictions.ipynb in Jupyter Notebook to visualize and interpret the model's predictions.
This project is licensed under the MIT License - see the LICENSE) file for details.
For any inquiries, please contact tiendat041202@gmail.com.
Made with β€οΈ by Dat Pham