Skip to content

Exploration of model performance, bias-variance tradeoffs, and dataset effects on classification accuracy with fully reproducible simulated data. This project was done in Python.

License

Notifications You must be signed in to change notification settings

alan-c-lin/simulated_classification_study

Repository files navigation

Simulated Classification Study: Comparing LDA, QDA, and KNN

This project explores how linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-Nearest Neighbors (KNN) perform using a simulated dataset. The study highlights how variation in data characteristics influences overall model fit and prediction accuracy. The dataset is generated within the notebook, so no external data is required.

Project Structure

  • simulated_classification_study.ipynb — Main Jupyter notebook containing code and explanations.
  • simulated_classification_study.html — Exported HTML version of the notebook.
  • simulated_classification_study.pdf — Exported PDF version of the notebook.
  • figures/ — Contains exported plots mainly for reference; all key figures are already embedded in the outputs.

Key Points

  • LDA, QDA, and KNN performed on simulated datasets for classification testing.
  • Confusion matrices utilized to evaluate model performance.
  • Discussion of bias-variance tradeoff in classification methods.
  • Comparison of advantages and disadvantages of each model across datasets.

Requirements

  • Python (3.10.16 recommended)
  • Jupyter Notebook / Jupyter Lab
  • Python packages: numpy, matplotlib, scikit-learn, ISLP

You can install the required packages using:

pip install numpy matplotlib scikit-learn ISLP

How to Use

  1. Clone or download this repository.
  2. Open simulated_classification_study.ipynb in Jupyter Notebook or Jupyter Lab.
  3. Run all cells to reproduce results, figures, and exported HTML/PDF outputs.

About

Exploration of model performance, bias-variance tradeoffs, and dataset effects on classification accuracy with fully reproducible simulated data. This project was done in Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published