MVTamperBench: Evaluating Robustness of Vision-Language Models

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled MVTamperBench: Evaluating Robustness of Vision-Language Models, by Amit Agarwal and 8 other authors

View PDF
HTML (experimental)

Abstract:Recent advancements in Vision-Language Models (VLMs) have enabled significant progress in complex video understanding tasks. However, their robustness to real-world manipulations remains underexplored, limiting their reliability in critical applications. To address this gap, we introduce MVTamperBench, a comprehensive benchmark designed to evaluate VLM’s resilience to video tampering effects, including rotation, dropping, masking, substitution, and repetition. By systematically assessing state-of-the-art models, MVTamperBench reveals substantial variability in robustness, with models like InternVL2-8B achieving high performance, while others, such as Llama-VILA1.5-8B, exhibit severe vulnerabilities. To foster broader adoption and reproducibility, MVTamperBench is integrated into VLMEvalKit, a modular evaluation toolkit, enabling streamlined testing and facilitating advancements in model robustness. Our benchmark represents a critical step towards developing tamper-resilient VLMs, ensuring their dependability in real-world scenarios.

Project Page: this https URL

Submission history

From: Amit Agarwal [view email]
[v1]
Fri, 27 Dec 2024 18:47:05 UTC (16,343 KB)
[v2]
Mon, 30 Dec 2024 04:20:52 UTC (16,343 KB)
[v3]
Wed, 15 Jan 2025 19:04:48 UTC (18,913 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.