Bias

in Artificial Intelligence

“Bias” in the context of AI usually means a (hidden) flaw in the way a computer system makes decisions causing it to systematically provide undesirable outputs.
Bias Plum

Browse research content

These are samples of research projects, papers or blogs written by our researchers.

More about Bias

“Bias” in the context of AI usually means a (hidden) flaw in the way a computer system makes decisions causing it to systematically provide undesirable outputs.

As there is a lot to unpack here, we will break this definition down further.

Systematic behaviors

Researchers are usually concerned with assessing ‘systematic behaviors’ of AI systems instead of singular ones, because there is a lot of randomness to how AI systems behave. Consequently, seeing singular undesirable behaviors from an AI system does not necessarily mean that the system is fundamentally flawed. However, if it systematically acts in ways that are undesirable (e.g., a large language model that at any mention of the word “CEO” assumes that the CEO is male), this speaks towards a flaw that needs to be addressed.

Hidden flaws

Modern AI systems are opaque: We provide them with data (e.g., sentences from newspapers) and a task (e.g., “Given the first part of a sentence, predict the next word”) and allow them to come up with their own rules-of-thumb to accomplish this task (see algorithms). In the end, after such “training”, it is hard to say how exactly the resulting AI system makes decisions. This also makes it hard to trace where undesirable outputs stem from (e.g., which part(s) of a language model make it treat the word “CEO” as male).

There are many ways in which researchers try to address bias in AI. Akin to asking a highschooler to write down all their intermediate calculations for a math problem (rather than only asking them for their final answer), one approach is to create AI systems whose decision-making is less opaque (see explainable AI). Hopefully, we can then understand where and how an undesirable output came about (this is like looking through the highschooler’s intermediate calculations for mistakes).

Another strategy is to look at internal states of the AI system and try to understand how these relate to biased outcomes. This is akin to interrupting the highschooler during their calculations and trying to ask them what they are currently thinking of. Alternatively, some researchers look at what kind of data is used to train the AI system. As it tries to come up with its own rules, based on what it is trained on, biased training data can lead to biased decision-making. For example, a language model might have (misleadingly) learned that “CEO” is a male word, because almost all CEOs mentioned in its training data sentences were male. Finally, some researchers pit AI against AI, by training an AI system on a task like “look at this other system’s output and tell us whether or not a human will think that the output is undesirable”.

Undesirable outputs

So AI systems are problematic, if they systematically provide undesirable outputs. But what are undesirable outputs? This, too, is an active topic of research. In a sense, whether outputs are desirable or undesirable depends on your moral intuitions and personal sense of fairness (e.g., should a language model assume that a “breast cancer patient” is female, because almost all breast cancer patients are, or should the model generally be agnostic towards sex?). Hence, scientists are exploring people’s moral preferences as well as debating whose preferences should take precedence (e.g., the preferences of society at large, or primarily the preferences of those people that are directly affected by the AI system in question).

Additionally, researchers are developing new ways of checking for potentially undesirable outputs. For example, instead of checking for undesirable outcomes based on singular characteristics (e.g., whether women and men are treated the same way, or black and white people) a recent focus has been on checking combinations (e.g., whether an AI system treats black women the same way it treats white men).

AI technology increasingly intersects with the lives of ordinary citizens (e.g., AI systems skim CVs of applicants, handle customer complaints, etc.) and make decisions with a veneer of objectivity. Consequently, we at CERTAIN believe that it is important to improve the ways in which researchers look for bias in AI and to explore which AI behaviors we, as a society, find acceptable.

Projects within this research theme​

Bias across borders: towards a social-political understanding of bias in machine learning

Goal of this project is to provide a clear conceptual framework on the notion of bias in machine learning for AI researchers in the context of algorithmic fairness.
Responsible White
Bias White
Theory-driven White
Ongoing

Inclusive Image and Video Captioning

Current AI systems describing the content of images and videos in natural language often make assumptions about the gender, nationality, or physical appearance of the people in them. In this project, we aim to fix this unwanted behavior by developing more inclusive systems.
Bias White
Inclusive White
Ongoing

Improving Language Model Bias Measures

Many researchers develop tools for measuring how biased language models are; in this project we work on improving these tools
Language models White
Bias White
Theory-driven White
Ongoing

From Learning to Meaning

In this project we explore whether Language Model’s generic sentences can teach us something about how people express stereotypes.
Language models White
Bias White
Ongoing

Can NLP bias measures be trusted?

van der Wall, O., Bachmann, D., Leidinger, A., van Maanden, L. Zuidema, W. & Schulz, K.

Automating the Analysis of Matching Algorithms

Endriss, U.

Participatory budgeting

Improving Language Model bias measures

Explainability in Collective Decision Making

Papers within this research theme

How robust and reliable can we expect language models to be?
Which stereotypes do search engines come with?
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?
Can NLP bias measures be trusted?