Quantifying Context Mixing in Transformers

This paper introduces Value Zeroing, a new technique to the ever-growing toolbox of techniques for interpreting the inner workings of Transformers – the deep learning model underlying chatGPT and all other state-of-the-art Large Language Models (but also music, vision and speech applications). The technique is a variant of a popular class of interpretation techniques, sometimes described as ‘explaining by deleting’. But VZ makes the fruitful decision to delete (or rather: zero out) only one specific type of component of the Transformer, known as the Value-vector. The paper shows that the technique performs very well in comparison to alternative methods to measure the amount of ‘context mixing’.

Reference: Hosein Mohebbi, Willem Zuidema, Grzegorz Chrupała, Afra Alishahi (2023). Quantifying Context Mixing in Transformers, Proceedings of EACL 2023, https://aclanthology.org./2023.eacl-main.245/

Other papers

Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Reclaiming AI as a theoretical tool for cognitive science
Dealing with semantic underspecification in multimodal NLP
How robust and reliable can we expect language models to be?
Which stereotypes do search engines come with?
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?