A Sequential classification model trained on a large real-world bank loan dataset from Kaggle to predict loan approval.
Customer financial and loan data 📥 Dataset for Bank Loan Prediction
EDA & Cleaning:
Handled nulls, outliers, patternsPreprocessing:
Encoded categoricals, scaled featuresModel: Sequential:
NN with 32,768 parameters, ReLU layers, sigmoid outputTraining:
Binary cross-entropy + Adam + early stopping
Loan Status
: Indicates whether the loan was approved or denied (target variable for classification).Current Loan Amount
: The total amount the client is requesting or currently owes on the loan.Term**: The
length of the loan repayment period**, usually in months (e.g., 36 or 60 months).Credit Score
: A numerical value showing the client’s creditworthiness based on past credit behavior.Annual Income
: The client's yearly earnings, used to assess their ability to repay the loan.Years in Current Job
: The duration of employment at the current job, indicating job stability.Home Ownership
: Shows if the client rents, owns, or has a mortgage, reflecting financial stability.Purpose
: The reason for taking the loan (e.g., debt consolidation, medical, education, etc.).Monthly Debt
: The total monthly payments for debts (excluding the new loan), used in debt-to-income ratio.Number of Open Accounts
: Total number of active credit accounts (like loans or credit cards) the client currently has.Current Credit Balance
The outstanding amount the client owes across all credit accounts.Bankruptcies
Number of times the client has declared bankruptcy, signaling past financial distress.Tax Liens
: Number of legal claims by the government on the client’s property due to unpaid taxes.
- LoanPrediction.ipynb – Full pipeline
- Kaggle Notebook - Full Explanation
Achieved strong classification performance of 80% on unseen test data with a well-generalized and compact model.
A Sequential classification model trained on a large real-world airline dataset to predict whether a flight will be delayed by 15+ minutes or canceled.
Historical flight data enriched with weather, airport, and carrier features.
2019 airline delays and cancellations
🎯 Target: DEP_DEL15
– Binary indicator for delay >15 minutes or cancellation.
EDA & Cleaning
: Handled missing values, outliers, anomaliesPreprocessing
:- Encoded categorical features
- Applied feature scaling
- Log-transformed skewed features
Custom Class Weights
: Balanced training for imbalanced label distributionModel
:- Deep Neural Network with multiple Dense layers, Dropout, and BatchNormalization
- Total parameters: ~35K
- Loss: Binary cross-entropy
- Optimizer: Adam
- Metrics: Accuracy, Precision, Recall, AUC
MONTH
: Month of flightDAY_OF_WEEK
: Day of the weekDEP_TIME_BLK
: Departure time block (e.g., 0600–0659)DISTANCE_GROUP
: Distance bucket of the flightSEGMENT_NUMBER
: Flight segment orderCONCURRENT_FLIGHTS
: Concurrent departing flights in same time blockNUMBER_OF_SEATS
: Seats available on aircraftCARRIER_NAME
: Airline carrierAIRPORT_FLIGHTS_MONTH
: Avg. airport trafficAIRLINE_FLIGHTS_MONTH
: Avg. flights operated by airlineAIRLINE_AIRPORT_FLIGHTS_MONTH
: Combined airport-airline trafficPLANE_AGE
: Age of aircraftLATITUDE
,LONGITUDE
: Geolocation of airportPRCP
,SNOW
,SNWD
,TMAX
,AWND
: Weather conditions (precipitation, snow, temp, wind)
- LoanPrediction.ipynb – Full pipeline
- Kaggle Notebook - Full Explanation
Achieved ~65% accuracy
on unseen test data using a custom-weighted loss function and tuned architecture.
The model shows strong generalization for real-world flight scheduling and operations forecasting.
A machine learning regression model trained on the well-known California housing dataset to predict house prices based on key demographic and geographic features.
California housing dataset from sklearn.datasets
, derived from the 1990 U.S. Census.
📥 California Housing Dataset Documentation
EDA & Cleaning
: Inspected and visualized data distribution, identified and handled outliersPreprocessing
: Scaled features usingStandardScaler
Model
: Trained a Linear Regression model and a Tree-based regressor (e.g., RandomForestRegressor)Training
: Usedmean_squared_error
as the loss metric and evaluated performance using MAE, RMSE, and R² score
MedInc
: Median income in the block group (normalized)HouseAge
: Median age of houses in the areaAveRooms
: Average number of rooms per householdAveBedrms
: Average number of bedrooms per householdPopulation
: Total population in the block groupAveOccup
: Average number of household membersLatitude
: Geographical latitudeLongitude
: Geographical longitude- 🎯
Price
: Median house value (target variable, scaled)
- HousePricePrediction.ipynb – Full code pipeline with training, evaluation, and visualizations
- Colab Notebook – Explained end-to-end
Achieved strong regression performance on test data:
MAE
: ~0.1582RMSE
: ~0.2072R² Score
: ~0.82
Demonstrated good generalization and accurate predictions with well-preprocessed features and effective model choice.
CreditCardFraudDetection.ipynb
This project aims to detect fraudulent transactions using both Machine Learning (Random Forest) and Deep Learning (Neural Networks) approaches. The focus is on classifying transactions as fraudulent or genuine using six key financial features.
- live Colab workFlow – Full pipeline