InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

stp2yDecember 18, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 24 Jun 2024 (v1), last revised 16 Dec 2024 (this version, v5)]

View a PDF of the paper titled InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection, by Junjie Chen and 4 other authors

View PDF
HTML (experimental)

Abstract:Sarcasm in social media, often expressed through text-image combinations, poses challenges for sentiment analysis and intention mining. Current multi-modal sarcasm detection methods have been demonstrated to overly rely on spurious cues within the textual modality, revealing a limited ability to genuinely identify sarcasm through nuanced text-image interactions. To solve this problem, we propose InterCLIP-MEP, which introduces Interactive CLIP (InterCLIP) with an efficient training strategy to extract enriched text-image representations by embedding cross-modal information directly into each encoder. Additionally, we design a Memory-Enhanced Predictor (MEP) with a dynamic dual-channel memory that stores valuable test sample knowledge during inference, acting as a non-parametric classifier for robust sarcasm recognition. Experiments on two benchmarks demonstrate that InterCLIP-MEP achieves state-of-the-art performance, with significant accuracy and F1 score improvements on MMSD and MMSD2.0. Our code is available at this https URL.

Submission history

From: Junjie Chen [view email]
[v1]
Mon, 24 Jun 2024 09:13:42 UTC (2,371 KB)
[v2]
Wed, 26 Jun 2024 05:40:16 UTC (2,371 KB)
[v3]
Sun, 4 Aug 2024 05:42:58 UTC (3,248 KB)
[v4]
Tue, 13 Aug 2024 09:52:57 UTC (2,649 KB)
[v5]
Mon, 16 Dec 2024 04:13:38 UTC (2,820 KB)

Source link
lol

By stp2y