Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

stp2yJanuary 8, 20250 Comments

Every’s Master Plan

[Submitted on 20 Oct 2024 (v1), last revised 7 Jan 2025 (this version, v3)]

View a PDF of the paper titled Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training, by Shahrad Mohammadzadeh and 4 other authors

View PDF
HTML (experimental)

Abstract:As large language models (LLMs) are increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations – outputs that are factually inaccurate or irrelevant to user input – have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M – 12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce Sensitivity Dropout (SenD), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SenD achieves this by deterministically dropping embedding indices with significant variability, referred to as Sensitive Embedding Indices. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore at 2x speed. This efficient metric is integrated into our protocol, allowing SenD to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to Wikipedia, Medical, and LegalBench domains.

Submission history

From: Juan Guerra [view email]
[v1]
Sun, 20 Oct 2024 18:18:23 UTC (3,742 KB)
[v2]
Sun, 8 Dec 2024 18:42:11 UTC (4,784 KB)
[v3]
Tue, 7 Jan 2025 14:56:42 UTC (4,784 KB)

Source link
lol

By stp2y