Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?

Paper details

Language models White
Bias White

In this paper we studied which stereotypes language models learn about a wide range of social groups (e.g. asians, homosexuals, white men etc.). To get an overview of the stereotypes that exist within human populations, we first created a stereotype dataset based on search engine autocompletions. We then tested how many of the human stereotypes in our dataset were also present in language models. By using emotion lexicons that map words to the underlying emotions that they reflect (e.g., anger, fear or trust), we were able to more generally study how negatively or positively a language model is overall biased towards each group.  Our results show how attitudes towards social groups vary across models and how quickly emotions and stereotypes about a group can change when the language model is exposed to new linguistic experience.

Reference: Rochelle Choenni, Ekaterina Shutova, and Robert van Rooij. 2021. Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1477–1491, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Other papers

Quantifying Context Mixing in Transformers
Reclaiming AI as a theoretical tool for cognitive science
Dealing with semantic underspecification in multimodal NLP
How robust and reliable can we expect language models to be?
Which stereotypes do search engines come with?
Can NLP bias measures be trusted?