Wait, that’s not an option: LLMs Robustness with Incorrect Multiple-Choice Options

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Wait, that’s not an option: LLMs Robustness with Incorrect Multiple-Choice Options, by Gracjan G’oral and Emilia Wi’snios and Piotr Sankowski and Pawe{l} Budzianowski

View PDF
HTML (experimental)

Abstract:Decision-making under full alignment requires balancing between reasoning and faithfulness – a challenge for large language models (LLMs). This study explores whether LLMs prioritize following instructions over reasoning and truth when given “misleading” instructions, such as “Respond solely with A or B”, even when neither option is correct. We introduce a new metric called “reflective judgment”, which sheds new light on the relationship between the pre-training and post-training alignment schemes. In tasks ranging from basic arithmetic to domain-specific assessments, models like GPT-4o, o1-mini, or Claude 3 Opus adhered to instructions correctly but failed to reflect on the validity of the provided options. Contrary, models from the Llama 3.1 family (8B, 70B, 405B) or base Qwen2.5 (7B, 14B, 32B) families exhibit improved refusal rates with size, indicating a scaling effect. We also observed that alignment techniques, though intended to enhance reasoning, sometimes weakened the models’ ability to reject incorrect instructions, leading them to follow flawed prompts uncritically. Finally, we have also conducted a parallel human study revealing similar patterns in human behavior and annotations. We highlight how popular RLHF datasets might disrupt either training or evaluation due to annotations exhibiting poor reflective judgement.

Submission history

From: Gracjan Góral [view email]
[v1]
Tue, 27 Aug 2024 19:27:43 UTC (1,144 KB)
[v2]
Thu, 10 Oct 2024 20:46:36 UTC (4,417 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.