View a PDF of the paper titled A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs, by Jake R. Watts and 1 other authors
Abstract:This paper proposes a new method for preventing unsafe or otherwise low quality large language model (LLM) outputs, by leveraging the stochasticity of LLMs. We propose a system whereby LLM checkers vote on the acceptability of a generated output, regenerating it if a threshold of disapproval is reached, until sufficient checkers approve. We further propose estimators for cost and failure rate, and based on those estimators and experimental data tailored to the application, we propose an algorithm that achieves a desired failure rate at the least possible cost. We demonstrate that, under these models, failure rate decreases exponentially as a function of cost when voter count and threshold are chosen according to the algorithm, and that the models reasonably estimate the actual performance of such a system in action, even with limited data.
Submission history
From: Jake Watts [view email]
[v1]
Wed, 24 Jul 2024 04:27:55 UTC (710 KB)
[v2]
Tue, 3 Sep 2024 19:28:39 UTC (710 KB)
Source link
lol