Quantifying Context Mixing in Transformers

This paper introduces Value Zeroing, a new technique to the ever-growing toolbox of techniques for interpreting the inner workings of Transformers – the deep learning model underlying chatGPT and all other state-of-the-art Large Language Models (but also music, vision and speech applications). The technique is a variant of a popular class of interpretation techniques, sometimes described as ‘explaining by deleting’. But VZ makes the fruitful decision to delete (or rather: zero out) only one specific type of component of the Transformer, known as the Value-vector. The paper shows that the technique performs very well in comparison to alternative methods to measure the amount of ‘context mixing’.

Reference: Hosein Mohebbi, Willem Zuidema, Grzegorz Chrupała, Afra Alishahi (2023). Quantifying Context Mixing in Transformers, Proceedings of EACL 2023, https://aclanthology.org./2023.eacl-main.245/

