Stick to your Role! Stability of Personal Values Expressed in Large Language Models

stp2yAugust 29, 20240 Comments

How to Evaluate an LLM's Ability to Follow Instructions

[Submitted on 19 Feb 2024 (v1), last revised 28 Aug 2024 (this version, v4)]

View a PDF of the paper titled Stick to your Role! Stability of Personal Values Expressed in Large Language Models, by Grgur Kovav{c} and 4 other authors

View PDF
HTML (experimental)

Abstract:The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs’ highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model’s behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence (specifically, value stability) should be studied as a specific property of LLMs and used as another dimension of LLM comparison (alongside others such as cognitive abilities, knowledge, or model size). We present a case-study on the stability of value expression over different contexts (simulated conversations on different topics) as measured using a standard psychology questionnaire (PVQ) and on behavioral downstream tasks. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We consider two settings (with and without instructing LLMs to simulate particular personas), two simulated populations, and three downstream tasks. We observe consistent trends in the stability of models and model families – Mixtral, Mistral, GPT-3.5 and Qwen families are more stable than LLaMa-2 and Phi. The consistency of these trends implies that some models exhibit higher value stability than others, and that stability can be estimated with the set of introduced methodological tools. When instructed to simulate particular personas, LLMs exhibit low Rank-order stability, which further diminishes with conversation length. This highlights the need for future research on LLMs that coherently simulate different personas. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs.

Submission history

From: Grgur Kovač [view email]
[v1]
Mon, 19 Feb 2024 14:53:01 UTC (11,539 KB)
[v2]
Mon, 29 Apr 2024 17:36:18 UTC (2,908 KB)
[v3]
Tue, 30 Apr 2024 07:09:22 UTC (2,908 KB)
[v4]
Wed, 28 Aug 2024 14:04:05 UTC (3,320 KB)

Source link
lol

By stp2y