View a PDF of the paper titled LUQ: Long-text Uncertainty Quantification for LLMs, by Caiqi Zhang and 3 other authors
Abstract:Large Language Models (LLMs) have demonstrated remarkable capability in a variety of NLP tasks. However, LLMs are also prone to generate nonfactual content. Uncertainty Quantification (UQ) is pivotal in enhancing our understanding of a model’s confidence on its generation, thereby aiding in the mitigation of nonfactual outputs. Existing research on UQ predominantly targets short text generation, typically yielding brief, word-limited responses. However, real-world applications frequently necessitate much longer responses. Our study first highlights the limitations of current UQ methods in handling long text generation. We then introduce textsc{Luq} and its two variations, a series of novel sampling-based UQ approaches specifically designed for long text. Our findings reveal that textsc{Luq} outperforms existing baseline methods in correlating with the model’s factuality scores (negative coefficient of -0.85 observed for Gemini Pro). To further improve the factuality of LLM responses, we propose textsc{Luq-Ensemble}, a method that ensembles responses from multiple models and selects the response with the lowest uncertainty. The ensembling method greatly improves the response factuality upon the best standalone LLM.
Submission history
From: Caiqi Zhang [view email]
[v1]
Fri, 29 Mar 2024 16:49:24 UTC (8,902 KB)
[v2]
Thu, 11 Jul 2024 14:22:32 UTC (8,886 KB)
[v3]
Fri, 4 Oct 2024 09:19:07 UTC (6,843 KB)
Source link
lol