InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

[Submitted on 16 Jul 2024 (v1), last revised 17 Oct 2024 (this version, v2)]

View a PDF of the paper titled InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback, by Haishuo Fang and 2 other authors

View PDF
HTML (experimental)

Abstract:A crucial requirement for deploying LLM-based agents in real-life applications is the robustness against risky or even irreversible mistakes. However, the existing research lacks a focus on preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces InferAct, a novel approach that leverages the belief reasoning ability of LLMs, grounded in Theory-of-Mind, to proactively detect potential errors before risky actions are executed (e.g., `buy-now’ in automatic online trading or web shopping). InferAct acts as a human proxy, detecting unsafe actions and alerting users for intervention, which helps prevent irreversible risks in time and enhances the actor agent’s decision-making process. Experiments on three widely-used tasks demonstrate the effectiveness of InferAct, presenting a novel solution for safely developing LLM agents in environments involving critical decision-making.