The Impact of Unstated Norms in Bias Analysis of Language Models

stp2ySeptember 30, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 4 Apr 2024 (v1), last revised 27 Sep 2024 (this version, v3)]

View a PDF of the paper titled The Impact of Unstated Norms in Bias Analysis of Language Models, by Farnaz Kohankhaki and 4 other authors

Abstract:Bias in large language models (LLMs) has many forms, from overt discrimination to implicit stereotypes. Counterfactual bias evaluation is a widely used approach to quantifying bias and often relies on template-based probes that explicitly state group membership. It measures whether the outcome of a task, performed by an LLM, is invariant to a change of group membership. In this work, we find that template-based probes can lead to unrealistic bias measurements. For example, LLMs appear to mistakenly cast text associated with White race as negative at higher rates than other groups. We hypothesize that this arises artificially via a mismatch between commonly unstated norms, in the form of markedness, in the pretraining text of LLMs (e.g., Black president vs. president) and templates used for bias measurement (e.g., Black president vs. White president). The findings highlight the potential misleading impact of varying group membership through explicit mention in counterfactual bias quantification.

Submission history

From: David Emerson [view email]
[v1]
Thu, 4 Apr 2024 14:24:06 UTC (884 KB)
[v2]
Sun, 7 Apr 2024 21:55:38 UTC (884 KB)
[v3]
Fri, 27 Sep 2024 13:12:23 UTC (1,604 KB)

Source link
lol

By stp2y