My Latest Projects

This repository contains a collection of my latest data science and machine learning projects. Each project highlights specific techniques, tools, and technologies used to solve real-world problems and derive actionable insights.

Customer Segmentation and Market Basket Analysis for E-commerce Retail
Market Price Prediction
Movie Genre Classification
Predictive Modeling for Disease Diagnosis
Credit Card Transactions Fraud Detection
NLP Newsgroups Classification and Deployment
Cab Industry Analysis: Data Exploration, Hypothesis Testing, and Strategic Recommendations
New York Housing Market Analysis and Price Prediction
Gym Members Calories Prediction with CatBoost
Air Quality Prediction

Projects

Customer Segmentation and Market Basket Analysis for E-commerce Retail

Description: Led a data-driven project focused on customer segmentation, sales trend analysis, and market basket analysis using a dataset from a UK-based online retailer. The project involved in-depth exploration of customer purchasing patterns, segmentation based on Recency, Frequency, Monetary (RFM) analysis, and discovery of product associations using the Apriori algorithm. The outcomes provided valuable insights for enhancing marketing strategies, product placement, and inventory management.
Technologies Used: Python, pandas, Seaborn, Scikit-learn, NetworkX, mlxtend
Techniques: RFM-T Segmentation, Market Basket Analysis, K-Means Clustering, Apriori Algorithm
Key Impact:
- Identified customer segments for targeted marketing and retention strategies.
- Discovered high-confidence product association rules for effective cross-selling and product bundling.
- Provided actionable insights for marketing campaigns, product placement, and inventory management strategies.

Market Price Prediction

Description: Developed a robust time series forecasting model for market analysis, focusing on predicting the quantity and prices of commodities based on historical data. The project involved data preprocessing, exploratory data analysis, feature engineering, model selection, training, and evaluation. Several models were tested, including ARIMA, SARIMA, Prophet, and LSTM, with LSTM models showing significant promise, especially in price forecasting.
Technologies Used: Python, Pandas, NumPy, ARIMA, SARIMA, Prophet, LSTM
Key Impact:
- Achieved high accuracy in forecasting commodity prices using the LSTM model.
- Contributed to optimizing inventory management and pricing strategies.
- Provided actionable insights for market analysis.

Movie Genre Classification

Description: Developed a comprehensive machine learning pipeline to classify movie genres based on descriptions using models such as Logistic Regression, SVM, Random Forest, and XGBoost. Explored feature extraction techniques, including TF-IDF, Word2Vec, and GloVe embeddings. The approach involved preprocessing, model training, evaluation, and deployment.
Technologies Used: Python, Pandas, NumPy, Scikit-learn, XGBoost, Word2Vec, GloVe
Key Impact:
- Achieved an accuracy of 0.58 with the SVM model using TF-IDF features.
- Demonstrated significant insights into NLP techniques for text classification.
- Provided a foundation for recommendation systems.

Predictive Modeling for Disease Diagnosis

Description: Built predictive models to classify individuals into diseased or non-diseased categories based on health attributes. The project aimed to assist healthcare professionals in early detection and personalized patient care.
Technologies Used: Python, Pandas, Scikit-learn, XGBoost, SHAP
Key Impact:
- Achieved 99.5% accuracy with the XGBoost model.
- Provided a reliable tool for early disease detection, enhancing patient outcomes.

Credit Card Transactions Fraud Detection

Description: Developed machine learning models to detect fraudulent credit card transactions. The project involved data preprocessing, feature engineering, and extensive exploratory data analysis (EDA).
Technologies Used: Python, Scikit-learn, XGBoost, RandomForest, SMOTE
Key Impact:
- Built a well-balanced fraud detection system with RandomForest and XGBoost models.
- Improved precision and recall for fraud detection.

NLP Newsgroups Classification and Deployment

Description: Developed a robust document classification system using the 20 Newsgroups dataset. The system classifies documents into categories, with applications in spam filtering and sentiment analysis.
Technologies Used: Python, Scikit-learn, SpaCy, NLTK
Key Impact:
- Achieved an F1-score of 0.83 and ROC-AUC score of 0.987.
- Successfully deployed the model for real-time classification.

Cab Industry Analysis: Data Exploration, Hypothesis Testing, and Strategic Recommendations

Description: Analyzed U.S. cab industry data to identify the most suitable company for investment. The project focused on customer usage patterns, market dynamics, and profitability trends.
Technologies Used: Python, Pandas, Statsmodels
Key Impact:
- Provided strategic recommendations for investment based on market dynamics.

New York Housing Market Analysis and Price Prediction

Description: Developed a machine learning pipeline for predicting housing prices in New York. Included data collection, exploratory data analysis, model training, and deployment.
Technologies Used: Python, XGBoost, Flask
Key Impact:
- Achieved a high R^2 score of 0.775 for housing price predictions.
- Delivered a functional web app for real-time price prediction.

Gym Members Calories Prediction with CatBoost

Description: This project predicts the number of calories burned by gym members during exercise sessions based on health and activity features. The model was trained using CatBoost, achieving high accuracy. It was deployed as a web service via FastAPI, containerized with Docker for seamless deployment. The project emphasizes real-time predictions for personalized fitness planning and progress tracking.
Technologies Used: Programming Languages: Python Libraries and Frameworks: CatBoost, FastAPI, SHAP, Pandas, NumPy Deployment Tools: Docker, Uvicorn Data Handling: RFE, Feature Engineering, Data Preprocessing Model Training: CatBoost with hyperparameter tuning (Optuna)
Key Impact: Achieved a low RMSE of 8.13, indicating high prediction accuracy. Deployed a scalable web service for real-time calorie predictions. Enhanced personalized fitness tracking and provided actionable insights for gym members.

Air Quality Prediction and Deployment

Description: Developed and deployed a machine learning-based system to predict air quality levels using a dataset of environmental and demographic metrics. The project included extensive data preprocessing, exploratory data analysis, model selection, and hyperparameter tuning. The final solution was deployed as a web service using FastAPI, Docker, and Kubernetes, with integrated monitoring via Prometheus and Grafana. The deployed application provides real-time air quality predictions, enabling actionable insights for governments, industries, and individuals to mitigate the effects of air pollution.
Technologies Used: Python, pandas, Seaborn, Scikit-learn, CatBoost, XGBoost, LightGBM, FastAPI, Docker, Kubernetes, Prometheus, Grafana, Render
Techniques: Class Imbalance Handling, Weighted Metrics (Weighted F1-Score), Feature Engineering, Optuna Hyperparameter Tuning, Containerization, Cloud Deployment, Monitoring
Key Impact: Achieved a high Weighted F1-Score of 0.9578 using the CatBoost model, demonstrating its effectiveness in handling imbalanced datasets and predicting critical air quality levels. Identified key environmental factors like Carbon Monoxide (CO) and proximity to industrial areas as major contributors to poor air quality. Successfully deployed the application in a production environment, offering an interactive API for real-time air quality predictions. Integrated monitoring tools (Prometheus and Grafana) for tracking service performance and usage metrics, ensuring reliability and transparency. Provided actionable insights to stakeholders for improving public health and environmental policies.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Air_Quality_Prediction_Project		Air_Quality_Prediction_Project
Gym_Members_Calories_Prediction_CatBoost		Gym_Members_Calories_Prediction_CatBoost
NLP_Newsgroups_Classification_Deployment		NLP_Newsgroups_Classification_Deployment
New_York_Housing_Market_Deployment		New_York_Housing_Market_Deployment
Customer_Segmentation_and_Market_Basket_Analysis_for_UK_Retail_Data.ipynb		Customer_Segmentation_and_Market_Basket_Analysis_for_UK_Retail_Data.ipynb
G2M_insight_for_Cab_Investment_firm.ipynb		G2M_insight_for_Cab_Investment_firm.ipynb
Project_Credit_Card_Transactions_Fraud_Detection.ipynb		Project_Credit_Card_Transactions_Fraud_Detection.ipynb
Project_Market_Price_Prediction.ipynb		Project_Market_Price_Prediction.ipynb
Project_Movie_Genre_Classification.ipynb		Project_Movie_Genre_Classification.ipynb
Project_Predictive_Modeling_for_Disease_Diagnosis.ipynb		Project_Predictive_Modeling_for_Disease_Diagnosis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Latest Projects

Table of Contents

Projects

Customer Segmentation and Market Basket Analysis for E-commerce Retail

Market Price Prediction

Movie Genre Classification

Predictive Modeling for Disease Diagnosis

Credit Card Transactions Fraud Detection

NLP Newsgroups Classification and Deployment

Cab Industry Analysis: Data Exploration, Hypothesis Testing, and Strategic Recommendations

New York Housing Market Analysis and Price Prediction

Gym Members Calories Prediction with CatBoost

Air Quality Prediction and Deployment

About

Releases

Packages

Languages

kostas696/My_Latest_Projects

Folders and files

Latest commit

History

Repository files navigation

My Latest Projects

Table of Contents

Projects

Customer Segmentation and Market Basket Analysis for E-commerce Retail

Market Price Prediction

Movie Genre Classification

Predictive Modeling for Disease Diagnosis

Credit Card Transactions Fraud Detection

NLP Newsgroups Classification and Deployment

Cab Industry Analysis: Data Exploration, Hypothesis Testing, and Strategic Recommendations

New York Housing Market Analysis and Price Prediction

Gym Members Calories Prediction with CatBoost

Air Quality Prediction and Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages