21
Nov
[Submitted on 18 Nov 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods, by Jai Doshi and Asa Cooper Stickland View PDF HTML (experimental) Abstract:Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes. LLMU and RMU have been proposed as two methods for LLM unlearning, achieving impressive results on unlearning benchmarks. We study in detail the efficacy of these methods by evaluating their impact on general model capabilities on…