Skip to content

4_25_22: Status Update and Poor Performing Labels

Allen Lau edited this page Apr 25, 2023 · 1 revision

Update on project:

  • Addressed issues with overfitting via data augmentation. This included experiments with 50k, 80k, and 100k image datasets. The model evaluation for the 100k image dataset shows that the models are no longer overfitting. The data augmentation included rotation, zoom, width/height shift, shear, and brightness. There is opportunity to add noise to the augmentation.

  • The team is working on modeling and evaluation of SVM, random forest, XGBoost, deep learning.

  • Next steps include completion of models and evaluating which model performs best and converting code into python scripts for demo script. Need to rerun baseline models with regularization. Include additional evaluation visualizations (confusion matrix, etc.)

Question:

  • Model is not performing well on certain letters. How do we address this issue?
  • More sampling for the specific letter
  • Augment the specific letter
  • Boosting algorithms will focus on poor performing labels
  • Feature extraction / engineering to help add helpful information