To run the project locally:
- Clone the repository:
git clone https://github.com/iamdebasishdas123/SageMaker_Flight_Prediction
- Install dependencies:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
This document provides an overview of the Flight Price Prediction project, including the dataset, project files, model training, and prediction process. The project aims to predict flight prices based on various features such as airline, date of journey, source, destination, departure time, arrival time, duration, total stops, and additional information. The prediction model is built using XGBoost and deployed as a web application using Streamlit.
Flight_Price_Prediction/
├── .gitignore
├── README.md
├── app.py
├── aws-xgboost-model
├── XGB-model
├── preprocessor.joblib
├── requirements.txt
├── data/
│ ├── flight_price.csv
│ ├── test.csv
│ ├── train.csv
│ └── val.csv
|── Preprocess file/
│ ├── test-pre.csv
│ ├── train-pre.csv
│ ├── val-pre.csv
├── notebooks/
│ ├── AWS-Model-training.ipynb
│ ├── Data_Cleaning.ipynb
│ ├── EDA.ipynb
│ ├── Feature_engineering.ipynb
│ ├── local-Model-training.ipynb
│ └── train-pre.csv
├── utils/
│ └── eda_helper_functions.py
└── .git/
- app.py: The main application file for the Streamlit web app.
- preprocessor.joblib: The saved preprocessor object used for transforming the input data.
- XGB-model: The trained XGBoost model for price prediction in local computer.
- aws-xgboost-model: The XGBoost model trained on AWS Sagemaker.
- requirements.txt: The list of dependencies required to run the project.
- notebooks/: Contains Jupyter notebooks for data cleaning, exploratory data analysis (EDA), feature engineering, and model training.
The model training process involves several steps, as documented in the Jupyter notebooks:
-
Data Cleaning (
Data_Cleaning.ipynb
):- Handling missing values
- Correcting data types
- Removing duplicates
-
Exploratory Data Analysis (EDA) (
EDA.ipynb
):- Visualizing the distribution of features
- Identifying correlations between features and the target variable
- Detecting outliers
-
Feature Engineering (
Feature_engineering.ipynb
):- Creating new features from existing ones (e.g., extracting date and time components)
- Encoding categorical variables
- Scaling numerical features
-
Model Training:
- Local Model Training (
local-Model-training.ipynb
): Training the model on local computer. - AWS Model Training (
AWS-Model-training.ipynb
): Training the model on AWS Sagemaker for better performance and scalability.
- Local Model Training (
The preprocessing pipeline is defined using scikit-learn and feature-engine transformers. It includes steps for handling categorical and numerical features, as well as feature selection. The pipeline is saved as preprocessor.joblib
.
The XGBoost model is trained on the preprocessed data. The trained model is saved as XGB-model
and aws-xgboost-model
for local and AWS training respectively.
The web application is built using Streamlit and allows users to input flight details to get a price prediction.
When the user inputs the flight details and clicks the "Predict" button, the app:
- Loads the saved preprocessor and model.
- Transforms the input data using the preprocessor.
- Predicts the flight price using the XGBoost model.
- Displays the predicted price.
- Route: Delhi to Kolkata
- Airline: Air India
- Actual Price: 5300 INR
- Predicted Price: 5920 INR
- Flight Details: Non-stop, duration 2h 35min
The Flight Price Prediction model provides an accurate and user-friendly way to predict flight prices based on various features. The web application allows for easy interaction and quick predictions, making it a valuable tool for travelers and analysts alike.