Skip to content

This study evaluates and optimizes the k-Nearest Neighbors (k-NN) classifier, a machine learning algorithm that can find patterns in large, highdimensional datasets.

License

Notifications You must be signed in to change notification settings

ZahirAhmadChaudhry/Optimizing-k-Nearest-Neighbors-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Machine Learning Project: Optimizing k-Nearest Neighbors Classifier 🤖

In this project, we optimize the k-Nearest Neighbors (k-NN) classifier, a machine learning algorithm that can find patterns in large, high-dimensional datasets. We use a waveform dataset with 5000 instances and 21 features as a case study. The project involves tuning the best k of a kNN classifier by cross-validation, reducing the complexity by running the Data Reduction algorithms, comparing the two methods studied in class for speeding-up the calculation of the 1NN with a brute force 1NN algorithm, and generating artificially imbalancy in the training data and analyzing the impact on the accuracy.

🎯 Objectives

  • Tune the best k of a kNN classifier by cross-validation
  • Reduce the complexity by running the Data Reduction algorithms
  • Compare the two methods studied in class for speeding-up the calculation of the 1NN with a brute force 1NN algorithm
  • Generate artificially imbalancy in the training data and analyze the impact on the accuracy

📁 Project Structure

  • data: Contains the waveform dataset used in the project.
  • notebooks: Jupyter Notebooks for implementing the tasks.
  • reports: Contains a detailed project report.

📊 Usage

  • Navigate to the notebooks folder.
  • Run the Jupyter Notebooks for implementing the tasks.
  • Customize the implementation as needed for your exploration.

🌐 Additional Information:

  • Optimal Bayes classification rate = 86% accuracy## Tech Stack

🚀 Languages:

  • Python

📊 Data Analysis and Visualization:

  • Pandas: Data manipulation and analysis
  • Matplotlib: Creating static, interactive, and animated visualizations
  • Seaborn: Statistical data visualization
  • NumPy: Numerical operations

🤖 Machine Learning:

  • Scikit-Learn: Machine learning models and tools

📚 Other Tools and Frameworks:

  • Jupyter Notebooks: Interactive computing and data exploration

🔧 Version Control:

  • Git: Version control system

💻 Development Environment:

  • Visual Studio Code

🔧 Deployment

To deploy this project run

  git clone https://github.com/ZahirAhmadChaudhry/Optimizing-k-Nearest-Neighbors-Classifier.git
  cd Optimizing-k-Nearest-Neighbors-Classifier
  pip install -r requirements.txt

🤝 Contributing:

Contributions are always welcome!

  • Fork the repository.
  • Create a new branch: git checkout -b feature-branch
  • Make changes and commit: git commit -m "Description of changes"
  • Push changes to your fork: git push origin feature-branch
  • Create a pull request.

📰 Project Report:

For a detailed report on the project, analysis, and conclusions, refer to the Project Report.

📄 License

This project is licensed under the MIT License. See LICENSE for details.

About

This study evaluates and optimizes the k-Nearest Neighbors (k-NN) classifier, a machine learning algorithm that can find patterns in large, highdimensional datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published