Skip to content
This repository has been archived by the owner on Dec 25, 2020. It is now read-only.

yelantf/cs420-codes

Repository files navigation

CS420 Machine Learning Final Project

Classification on modified MNIST dataset. Our LocNet achieves the accuracy of 99.90%.

Contents

Requirements

Model List

---- Traditional Methods
    ---- Naive Bayes (18.81%)
    ---- Decision Tree (50.94%)
    ---- Random Forest (87.61%)
    ---- K-Nearest Neighbors (88.63%)
    ---- Support Vector Machine (87.07%)
---- Deep Learning Methods
    ---- FC Baseline (90.11%)
    ---- CNN Baseline (99.47%)
    ---- PointNet (91.02%)
    ---- SegNet (99.40%)
    ---- LocNet (99.90%)

Prepare Data

Download datasets from jbox and move them to mnist/ folder, the folder structure should look like this:

---- mnist/
    ---- mnist_train/
    ---- mnist_test/

Traditonal Methods

Naive Bayes

cd traditional_methods/NaiveBayes/
python Bayes.py

Decision Tree

cd traditional_methods/DecisionTree/
python Tree.py

Random Forest

cd traditional_methods/RandomForest/
python ForestBestN.py

These commands will output the performance of random forest with different number of decision trees, demonstrated by the following two figures.

K-Nearest Neighbors

cd traditional_methods/KNN/
python KNNBestK.py

These commands will output the performance of KNN with different K, demonstrated by the following figure.

Support Vector Machine

cd traditional_methods/SVM/
python SVMBestDim.py

These commands will output the performance of linear SVM on different dimension data reduced by PCA, demonstrated by the following two figures.

cd traditional_methods/SVM/
python SVMBestKernel.py

These commands will output the performance of SVM with different kernels.

Influence of Modification

For five traditional models above, running *Preprocess.py in their respective directory will give the results as the following table shows.

Naive Bayes Desision Tree Random Forest K-Nearest Neighbor SVM
Target dataset 18.81% 50.94% 87.61% 88.63% 87.07%
Keep largest CC 19.73% 55.54% 89.07% 88.73% 88.29%
Shift CC to center 75.90% 92.69% 98.46% 97.55% 96.85%

note: "CC" stands for connected components.

Deep Learning Methods

In this section, we implement FC and CNN baselines for classification. Three methods are proposed to improve the performance:

  • PointNet
  • SegNet
  • LocNet

Usage

Each model is in a seperate folder in deep_learning_methods/. To train a model, please go into the corresponding folder and run train_xxx.py. For example, to train a baseline CNN model, you can do as follows:

cd deep_learning_methods/CNN_baseline/
python train_cnn.py

Deep Model Performance

Baseline Largest CC CC Centralization SegNet LocNet
FC 90.11% 92.46% 99.03% 92.78% 99.28%
CNN 99.47% 99.31% 99.88% 99.40% 99.90%
PointNet 91.02%

note: "CC" stands for connected components.

SegNet

We propose SegNet to automatically denoise original images using neural networks. Improvements are significant on FC baseline.

note: "i-th Epoch" stands for the visualization results after training for i epochs.

LocNet

We propose LocNet to automatically localize digits in original images by tight bounding boxes (BBox) using neural networks. Improvements are significant on both FC and CNN baselines.

note: green boxes are ground truth and red boxes are predictions.

Team Member

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages