This repository contains an implementation of Vision Transformers (ViT) with a token merging mechanism using Bipartite Soft Merging from the paper https://arxiv.org/abs/2210.09461. The objective is to enhance the throughput of Vision Transformers by merging tokens in an adaptive manner. Includes training code.
- Vision Transformer (ViT) Implementation: Based on the original ViT architecture.
- Bipartite Soft Merging:merge tokens effectively, reducing computational load.
To get started, clone the repository and install the necessary dependencies:
git clone https://github.com/Ctrl408/ViT-implementations.git
cd ViT-implementations