This project explores how linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-Nearest Neighbors (KNN) perform using a simulated dataset. The study highlights how variation in data characteristics influences overall model fit and prediction accuracy. The dataset is generated within the notebook, so no external data is required.
simulated_classification_study.ipynb
— Main Jupyter notebook containing code and explanations.simulated_classification_study.html
— Exported HTML version of the notebook.simulated_classification_study.pdf
— Exported PDF version of the notebook.figures/
— Contains exported plots mainly for reference; all key figures are already embedded in the outputs.
- LDA, QDA, and KNN performed on simulated datasets for classification testing.
- Confusion matrices utilized to evaluate model performance.
- Discussion of bias-variance tradeoff in classification methods.
- Comparison of advantages and disadvantages of each model across datasets.
- Python (3.10.16 recommended)
- Jupyter Notebook / Jupyter Lab
- Python packages:
numpy
,matplotlib
,scikit-learn
,ISLP
You can install the required packages using:
pip install numpy matplotlib scikit-learn ISLP
- Clone or download this repository.
- Open
simulated_classification_study.ipynb
in Jupyter Notebook or Jupyter Lab. - Run all cells to reproduce results, figures, and exported HTML/PDF outputs.