From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge, by Dawei Li and 12 other authors

View PDF
HTML (experimental)

Abstract:Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the “LLM-as-a-judge” paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at url{this https URL} and url{this https URL}.

Submission history

From: Dawei Li [view email]
[v1]
Mon, 25 Nov 2024 17:28:44 UTC (1,298 KB)
[v2]
Tue, 10 Dec 2024 05:24:37 UTC (1,304 KB)
[v3]
Thu, 12 Dec 2024 00:35:03 UTC (1,304 KB)
[v4]
Tue, 24 Dec 2024 21:46:31 UTC (1,311 KB)
[v5]
Mon, 6 Jan 2025 05:53:18 UTC (1,316 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.