How Does BERT Answer Questions?

What is this about?
Watch how BERT (fine-tuned on QA tasks) transforms tokens to get to the right answers. This demo shows how the token representations change throughout the layers of BERT. We observed that the transformations mostly pass four phases related to traditional Question Answering pipelines.

The tool demonstrates the findings from our paper: Betty van Aken, Benjamin Winter, Alexander Löser and Felix Gers. How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations. CIKM 2019.
The 4 phases of BERT's transformations
1. Topical / Word Clusters Equal words and topics are clustered without current concern consideration.
2. Connect Entities with Mentions and Attributes Tokens are clustered based on their relation in the context.
3. Match Question with Supporting Facts Relevant context parts can be found close to question tokens.
4. Answer Extraction The answer tokens are separated from the rest. Semantic clusters are dissolved.
SQuAD [1]
HotpotQA [2]
bAbI QA [3]
< >
Predicted Answer
Layer 0

Phase 1: Topical / Word Clusters

[1] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In Proceedings of EMNLP 2016

[2] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of EMNLP 2018.

[3] Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In Proceedings of ICLR 2016

Contact: Betty van Aken (@betty_v_a) @ DATEXIS