View a PDF of the paper titled Impromptu Cybercrime Euphemism Detection, by Xiang Li and 4 other authors
Abstract:Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode’s capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.
Submission history
From: Xiang Li [view email]
[v1]
Mon, 2 Dec 2024 11:56:06 UTC (913 KB)
[v2]
Tue, 3 Dec 2024 07:12:27 UTC (913 KB)
Source link
lol