This repository contains my implementation for the Classification and Evaluation assignment as part of the Introduction to Machine Learning (COMP90049) course at The University of Melbourne. The assignment involves applying various classifiers to the Adult and Student datasets, evaluating results, and answering conceptual questions.
git clone https://github.com/KIAND-glitch/Classification-Models.git
- Adult: Predict income based on personal attributes.
- Student: Predict final grade based on various attributes.
Attributes are outlines in readme.txt within the datasets
- Read data into a pandas DataFrame.
- Handle missing values and use one-hot encoding.
- Implement equal-width binning for numerical features.
- Evaluate Zero-R, One-R, and Weighted Random models.
- Discuss differences between baseline models and datasets.
- Inspect feature selection in One-R.
- Analyze convergence of error rate in Weighted Random.
- Implement Gaussian, Bernoulli, and Categorical Naive Bayes.
- Compare Naive Bayes models against baseline.
- Identify the best-performing NB classifier for each dataset.
- Discuss assumptions of Gaussian NB.
- Train/test K-Nearest Neighbor models with Euclidean distance.
- Compare weighted and majority KNN models.
- Compute micro/macro-averaged precision for specific models.
- Discuss ethical problems in using the Categorical Naive Bayes classifier.
- Remove ethically problematic features, train NB classifiers, and compare performance changes.
- Discuss fairness implications of removing problematic features.