Can NLP bias measures be trusted?

In this paper, we explore how knowledge from psychological testing theory can be applied to improve the ways we test language models for bias.

This paper is a product of our larger project on improving bias measurement tools. Measuring biases in language models is tough – many researchers have attempted this task, but current bias measurement tools fall short. Instead of proposing a new one, we discuss in this paper how bias measurement tools can be better evaluated and developed.

To that end, we discuss two important concepts from psychometrics, the subfield of psychology concerned with good testing: reliability, the extent to which the number we get from our measurement tool reflects something systematic, and validity, the extent to which the systematic thing we measure is the thing we actually want to measure (e.g., that a number from a measurement tool reflects how sexist a language model is, not just how often it uses the pronoun “she”).

We discuss several different aspects of reliability and validity, as well as how these can be applied in the assessment and development of bias measurement tools. For example, similarly to how we would not accept a scale if it measured our weight today as 80 kilograms and tomorrow as 25, we argue that researchers should demonstrate that their bias measurement tools have acceptable test-retest reliability (i.e., consistency of measurements across time).

Other papers

Quantifying Context Mixing in Transformers
Reclaiming AI as a theoretical tool for cognitive science
Dealing with semantic underspecification in multimodal NLP
How robust and reliable can we expect language models to be?
Which stereotypes do search engines come with?
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?