This repository contains an implementation of the Vision Transformer (ViT) model as described in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". The Vision Transformer leverages the power of transformers, typically used in NLP, to achieve state-of-the-art results in image classification tasks.
-
Notifications
You must be signed in to change notification settings - Fork 0
dusky04/vit-pytorch
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This repository contains an implementation of the Vision Transformer (ViT) model as described in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale".
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published