Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

stp2yOctober 16, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 14 Mar 2024 (v1), last revised 15 Oct 2024 (this version, v4)]

View a PDF of the paper titled Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation, by Yunhao Gou and 8 other authors

View PDF
HTML (experimental)

Abstract:Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting the unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed with the introduction of image features. To construct robust MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in MLLMs. Experiments on five state-of-the-art (SoTA) MLLMs demonstrate that ECSO enhances model safety significantly (e.g.,, 37.6% improvement on the MM-SafetyBench (SD+OCR) and 71.3% on VLSafe with LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we show that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for MLLM alignment without extra human intervention.

Submission history

From: Yunhao Gou [view email]
[v1]
Thu, 14 Mar 2024 17:03:04 UTC (1,031 KB)
[v2]
Fri, 22 Mar 2024 09:07:06 UTC (1,121 KB)
[v3]
Mon, 15 Jul 2024 07:03:53 UTC (1,142 KB)
[v4]
Tue, 15 Oct 2024 04:55:36 UTC (1,085 KB)

Source link
lol

By stp2y