SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses, by Dongwei Jiang and 5 other authors

View PDF
HTML (experimental)

Abstract:Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first formulate a unified framework that allows us to compare the generative and discriminative capability of any model on any task. In our resulting experimental analysis of several open-source and industrial LLMs, we observe that models are not reliably better at discriminating among previously-generated alternatives than generating initial responses. This finding challenges the notion that LLMs may be able to enhance their performance only through their own judgment.

Submission history

From: Dongwei Jiang [view email]
[v1]
Thu, 4 Apr 2024 20:27:37 UTC (1,478 KB)
[v2]
Wed, 4 Sep 2024 02:00:58 UTC (1,487 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.