This repository demonstrates multiple classification techniques applied to the MNIST dataset—one of the most well-known datasets in machine learning. It is based on Chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron.
The project walks through binary, multiclass, multilabel, and multioutput classification, including performance metrics, error analysis, and decision threshold tuning.
- Binary, Multiclass, and Multilabel classification examples
- Confusion matrix, precision-recall, F1 score, and ROC curve
- Decision threshold tuning and precision/recall tradeoff
- Comparison of classifiers: SGD, Random Forest, KNN
- Cross-validation and
StratifiedKFold
evaluation - Error analysis using prediction visuals and score plots
To build a strong foundation in classification techniques by:
- Understanding how classifiers behave under different metrics
- Learning to evaluate, tune, and compare performance
- Analyzing misclassifications to guide model improvements
MNIST Dataset
- Handwritten digits: 70,000 grayscale 28×28 images (0–9)
train_set
: 60,000 imagestest_set
: 10,000 images- Classification task: predict the digit (0 to 9) from pixel data
Task | Description |
---|---|
Binary Classification | Is digit == 5? Using SGDClassifier |
Multiclass Classification | Classify digits 0–9 using OvA and OvO strategies |
Multilabel Classification | Predict multiple labels per image |
Multioutput Classification | Denoising digits using autoencoder-like logic |
- Confusion Matrix
- Precision, Recall, F1 Score
- Cross-validation (
cross_val_score
,StratifiedKFold
) - Decision threshold adjustment
- ROC Curve and AUC Score
- Top errors and visualization of confusion
Model | Purpose |
---|---|
SGDClassifier |
Fast linear classifier for baseline |
KNeighborsClassifier |
Lazy learning with proximity voting |
RandomForestClassifier |
Ensemble-based robust classification |
Threshold tuning logic | Manual control over decision boundaries |
Based on:
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
by Aurélien Géron
This repository is open source under the MIT License.
Created and maintained by RM Villa.