Image Classification With Vision Transformer

Vision Transformer, or ViT, tries to perform image recognition in a sequence modeling way. By dividing the image into patches and feeding them to the revered model in language modeling, a.k.a. Transformer, ViT shows that over-reliance on the spatial assumption is not obligatory. However, a study shows that giving more cues about spatial information, i.e., subjecting consecutive convolutions to the image before funneling it to the Transformer, aids the ViT in learning better. Since ViT employs the Transformer block, we can easily receive the attention map explaining what the network sees. In this project, we will be using the CIFAR-100 dataset to examine ViT performance. Here, the validation set is fixed to be the same as the test set of CIFAR-100. Online data augmentations, e.g., RandAugment, CutMix, and MixUp, are utilized during training. The learning rate is adjusted by following the triangular cyclical policy.

Experiment

Experience the journey of training, testing, and inference image classification with ViT by jumping to this notebook.

Result

Quantitative Result

Here are the quantitative results of ViT performance:

Test Metric	Score
Loss	1.353
Top1-Acc.	64.92%
Top5-Acc.	87.29%

Accuracy and Loss Curve

Accuracy curves of ViT on the CIFAR-100 train and validation sets.

Loss curves of ViT on the CIFAR-100 train and validation sets.

Qualitative Result

The predictions and the corresponding attention maps are served in this collated image.

Several prediction results of ViT and their attention map.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
Image_Classification_With_Vision_Transformer.ipynb		Image_Classification_With_Vision_Transformer.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Classification With Vision Transformer

Experiment

Result

Quantitative Result

Accuracy and Loss Curve

Qualitative Result

Credit

About

Releases

Packages

Languages

reshalfahsi/image-classification-vit

Folders and files

Latest commit

History

Repository files navigation

Image Classification With Vision Transformer

Experiment

Result

Quantitative Result

Accuracy and Loss Curve

Qualitative Result

Credit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages