DoubleViT: Pushing transformers towards the end because of convolutions

Official repository of the ICPR 2024 paper "DoubleViT: Pushing transformers towards the end because of convolutions" Code contains the DoubleViT model with the CIFAR-10 dataset as example.

Mahendran Narayanan

Abstract

Vision transformers have outperformed convolutional networks and dominate the field in vision tasks. Recent trends indicate a shift towards exploring alternatives to attention mechanisms. We introduce DoubleViT, a model that pushes the attention mechanisms towards the end of the network. The network begins with convolutional layers and concludes with attention mechanisms. The convolutional layers and their depth are determined based on input shapes. In this approach, the shift mechanism learns from the outputs of the convolution layers rather than from the input image patches. This fusion enhances the network’s ability to capture better feature representations. This proposed model has a decrease in parameters when compared to other ViTs. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures. Experimental results demonstrate a remarkable increase in the classification accuracy of the proposed model.

Cite

If you have used DoubleViT in your research, please cite our work. 🎓

@inproceedings{Mahendran2024doublevit,
    title = {DoubleViT: Pushing transformers towards the end because of convolutions},
    author = {Narayanan, Mahendran},
    booktitle = {International Conference on Pattern Recognition (ICPR)},
    year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
DoubleViT.py		DoubleViT.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoubleViT: Pushing transformers towards the end because of convolutions

Abstract

Cite

About

Releases

Packages

Languages

License

mahendran-narayanan/Double-ViT

Folders and files

Latest commit

History

Repository files navigation

DoubleViT: Pushing transformers towards the end because of convolutions

Abstract

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages