This is a simple implementation of VIT. the state of the art in image classification It is only the application of Transformer in the image domain with slight modification in the implementation in order to handle the different data modality.
This project is made so that the training is intuitive you can modify some parameters directly on the code another version will take into account the use of GPU which you can always do by modifying the code images must be squares of size 384 to respect the values of the article here is the article: click here
git clone https://github.com/lodjim/VIT
cd VIT
python3 main.py --help