Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

stp2yDecember 25, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 17 Dec 2024 (v1), last revised 24 Dec 2024 (this version, v2)]

View a PDF of the paper titled Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph, by Yibo Zhao and 3 other authors

View PDF
HTML (experimental)

Abstract:The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false positives, limiting freedom of speech. To address these issues, we propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection. First, we construct a comprehensive meta-toxic knowledge graph by utilizing LLMs to extract toxic information through a three-step pipeline, with toxic benchmark datasets serving as corpora. Second, we query the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Extensive experiments and in-depth case studies across multiple datasets demonstrate that our MetaTox significantly decreases the false positive rate while boosting overall toxicity detection performance. Our code will be available soon.

Submission history

From: YiBo Zhao [view email]
[v1]
Tue, 17 Dec 2024 06:28:28 UTC (935 KB)
[v2]
Tue, 24 Dec 2024 04:38:57 UTC (935 KB)

Source link
lol

By stp2y