[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
-
Updated
Oct 11, 2021 - Python
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.
Pytorch implementation of VQA using Stacked Attention Networks: Multimodal architecture for image and question input, using CNN and LSTM, with stacked attention layer for improved accuracy (54.82%). Includes visualization of attention layers. Contributions welcome. Utilizes Visual VQA v2.0 dataset.
Adaptively fine tuning transformer based models for multiple domains and multiple tasks
Grid features extraction for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Leveraging the BLIP Model for Visual Question Answering: A Comparative Analysis on VQA and DAQUAR Datasets
Add a description, image, and links to the vqav2 topic page so that developers can more easily learn about it.
To associate your repository with the vqav2 topic, visit your repo's landing page and select "manage topics."