Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations

stp2yDecember 18, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 9 Oct 2024 (v1), last revised 17 Dec 2024 (this version, v2)]

View a PDF of the paper titled Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations, by Tarun Raheja and 2 other authors

View PDF
HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks. This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming. We analyze various attack methods, including gradient-based optimization, reinforcement learning, and prompt engineering approaches. We discuss the implications of these attacks on LLM safety and the need for improved defense mechanisms. This work aims to provide a thorough understanding of the current landscape of red-teaming attacks and defenses on LLMs, enabling the development of more secure and reliable language models.

Submission history

From: Tarun Raheja [view email]
[v1]
Wed, 9 Oct 2024 01:35:38 UTC (363 KB)
[v2]
Tue, 17 Dec 2024 04:34:32 UTC (363 KB)

Source link
lol

By stp2y