This is a PyTorch implementation of MobileViT specified in "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer", arXiv 2021.
👉 Check out CoAtNet if you are interested in other Convolution + Transformer models.
import torch
from mobilevit import mobilevit_xxs
img = torch.randn(1, 3, 256, 256)
vit = mobilevit_xxs()
out = vit(img)
@article{mehta2021mobilevit,
title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
author={Mehta, Sachin and Rastegari, Mohammad},
journal={arXiv preprint arXiv:2110.02178},
year={2021}
}
Code adapted from MobileNetV2 and ViT.