Transformer models are improving at a rapid pace, making it of paramount importance to develop methods to explain, reverse-engineer, and visualize their inner workings. In this project, we study the interpretability of transformer models through a series of experiments divided into two parts:
- Visualizing Transformer Attention
- Results published in paper AttentionViz: A Global View of Transformer Attention.
- Exploring Induction Heads in BERT
This research was conducted as part of an independent study at the Harvard Insight and Interaction Lab under mentorship of Professor Martin Wattenberg, Professor Fernanda Viégas, and Catherine Yeh. The full write-up of this project can be found here.