Benchmarking Large Language Model Uncertainty for Prompt Optimization

Pile-T5


View a PDF of the paper titled Benchmarking Large Language Model Uncertainty for Prompt Optimization, by Pei-Fu Guo and 2 other authors

View PDF

Abstract:Prompt optimization algorithms for Large Language Models (LLMs) excel in multi-step reasoning but still lack effective uncertainty estimation. This paper introduces a benchmark dataset to evaluate uncertainty metrics, focusing on Answer, Correctness, Aleatoric, and Epistemic Uncertainty. Through analysis of models like GPT-3.5-Turbo and Meta-Llama-3.1-8B-Instruct, we show that current metrics align more with Answer Uncertainty, which reflects output confidence and diversity, rather than Correctness Uncertainty, highlighting the need for improved metrics that are optimization-objective-aware to better guide prompt optimization. Our code and dataset are available at this https URL.

Submission history

From: Yun-Da Tsai [view email]
[v1]
Mon, 16 Sep 2024 07:13:30 UTC (1,772 KB)
[v2]
Wed, 25 Dec 2024 03:12:44 UTC (2,071 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.