This repository is for greyscale scene image classification from the in-class Kaggle challenge and NCTU Computer Vision HW.
The dataset is a little different:
- Kaggle challenge: 3859 grey images with 13 categories (train:2819, test:1040)
- CV HW: 1650 grey images with 15 categories (train:1500, test:150)
- VGG16 (imagenet pretrain) + 2*FC layers & Dropout
- ResNet50 (imagenet pretrain) on Keras 2.2.4 Broken BatchNorm Freeze
- Image Size: 224, VGG16 preprocess_input + horizontal_flip (on-the-fly data augmentation)
- Train on spilt training set(loss some of training data)
- Ensemble prediction on Kaggle 0.899 accuracy
- ResNet50 (imagenet pretrain) on TF2.2 classification_models
- CosineAnnealingScheduler
- Image Size: 256, + horizontal_flip + brightness + zoom + rotation (on-the-fly data augmentation)
- Train on whole training set
- Single model prediction on CV HW 0.98 accuracy
Model | Batch_size | Accuracy | Extra |
---|---|---|---|
EfficientNetB0 | 64 | 0.92 | |
EfficientNetB0 | 64 | 0.906 | noisy-student pretrain |
EfficientNetB1 | 64 | 0.926 | |
EfficientNetB1 | 64 | 0.906 | noisy-student pretrain |
EfficientNetB4 | 16 | 0.92 | |
EfficientNetB4 | 32 | 0.95 | |
EfficientNetB4 | 32 | 0.89 | Freeze 1st Block(Conv+BN+Activation) |
EfficientNetB4 | 32 | 0.9 | Freeze 1~2 Blocks(Conv+BN+Activation) |
EfficientNetB5 | 16 | 0.926 | |
EfficientNetB6 | 16 | 0.9 | Freeze 1st Block(Conv+BN+Activation) |
EfficientNetB6 | 16 | 0.926 | Freeze 1~2 Blocks(Conv+BN+Activation) |
EfficientNetB6 | 16 | 0.94 | Freeze 1~3 Blocks(Conv+BN+Activation) |
EfficientNetB6 | 16 | 0.85 | Freeze 1~4 Blocks(Conv+BN+Activation) |
Freeze first 12 layers (0~47 layers in the implment)
Model | Batch_size | Accuracy | Extra |
---|---|---|---|
ResNet50 | 64 | 0.953 | Generate New Data |
ResNet50 | 64 | 0.966 | on-the-fly |
ResNet50 | 64 | 0.946 | on-the-fly + constrast_pil |
ResNet50 | 64 | 0.98 | on-the-fly + rotation 5 |
ResNet50 | 64 | 0.96 | on-the-fly + rotation 7 |
ResNet50 | 64 | 0.953 | on-the-fly + rotation 10 |
- BiT-M (pre-trained on ImageNet-21k), on-the-fly
Model | Batch_size | Accuracy | Extra |
---|---|---|---|
R50x1 | 64 | 0.966 | |
R50x3 | 64 | 0.96 | |
R101x1 | 64 | 0.96 | |
R101x3 | 64 | 0.953 |
- Use ResNet50 with imagenet pretrain and freeze first 12 layers
- Large batch size might be helpful
- Use on-the-fly (random) instead of generate new data on data augmentation
- Use Brightness, Zoom and Rotation instead of Equalize and RandomResizedCropped
- Use TF2 if you want to freeze BN layers
- Sparse labels might help on accuracy (Dense without softmax, class_mode='sparse', loss=SparseCategoricalCrossentropy)