Project details

Language models White
Bias White
Theory-driven White

Improving Language Model Bias Measures

Ensuring that language models do not provide biased (e.g., sexist) outputs is an important task. That is why many researchers are developing tools for measuring such biases. Instead of developing measuring tools themselves, researchers in this project group work on a more fundamental question: “What are the hallmarks of a good tool for measuring bias?”

Questions that we work on include: How can we ensure that what we measure with these tools is meaningful? Do measurements with this tool predict outcomes (e.g., harms) that we care about? Of all the “questions” that we could ask AI systems (to reveal their biases), how do we pick the best ones? When do we best test language models for bias, while they are still developing or when they are fully trained?

When working on these types of questions, members of this project group often look at what other scientific disciplines (like psychology) have to say about developing good measurement tools. By working on these theoretical questions, the group tries to find good ways of evaluating and improving bias measurement tools.

Papers related to this project

Can NLP bias measures be trusted?