A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs

[Submitted on 24 Jul 2024 (v1), last revised 3 Sep 2024 (this version, v2)]

View a PDF of the paper titled A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs, by Jake R. Watts and 1 other authors

View PDF

Abstract:This paper proposes a new method for preventing unsafe or otherwise low quality large language model (LLM) outputs, by leveraging the stochasticity of LLMs. We propose a system whereby LLM checkers vote on the acceptability of a generated output, regenerating it if a threshold of disapproval is reached, until sufficient checkers approve. We further propose estimators for cost and failure rate, and based on those estimators and experimental data tailored to the application, we propose an algorithm that achieves a desired failure rate at the least possible cost. We demonstrate that, under these models, failure rate decreases exponentially as a function of cost when voter count and threshold are chosen according to the algorithm, and that the models reasonably estimate the actual performance of such a system in action, even with limited data.