Which stereotypes do search engines come with?

Paper details

Bias White

This paper investigates whether negative stereotypes are perpetuated by search suggestions that we see on search engines like Google, DuckDuckGo or Yahoo.

Up until some years ago, Google services like autosuggestions were described as a ‘window into the collective Internet consciousness […], not an attractive scene’. Much has changed since journalists and researchers alike have called out offensive autosuggestions such as ‘are Jews [evil]’. The search engine now states that it removes ‘hateful or prejudicial’ content in autosuggestions.

Stereotyping in autosuggestions (see Figure) is more of a grey area, not explicitly covered by Google policy, yet linked by psychologists to us ‘picking up’ stereotypes which can cement ‘oppressive social relationships’ in the real world.

In this paper, we investigate whether stereotyping is adequately addressed in autosuggestions by Google, DuckDuckGo and Yahoo. We find that stereotypes against religious groups such as Jews are now thoroughly suppressed by Google, while offensive autosuggestions for older people and women persist, e.g., women are [controlling, clingy, dramatic]. DuckDuckGo overall moderates most stereotypes in its autosuggestions. Yahoo! moderates the least and displays autosuggestions not unlike Google’s prior to the introduction of its content moderation policy.

Considering the risks attached to reinforcing stereotypes in the real world, we suggest that stereotypes should be recognised explicitly in company policies and addressed equally for different demographic groups. We ask whether autosuggest as a service should be optional or disabled by default.

Reference: Alina Leidinger and Richard Rogers. 2023. Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?. In 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23), June 12–15, 2023, Chicago, IL, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3593013.3594062

Other papers

Quantifying Context Mixing in Transformers
Reclaiming AI as a theoretical tool for cognitive science
Dealing with semantic underspecification in multimodal NLP
How robust and reliable can we expect language models to be?
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?
Can NLP bias measures be trusted?