MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification, by Sajjad Amini and 3 other authors

View PDF
HTML (experimental)

Abstract:We present a simple yet effective method to improve the robustness of both Convolutional and attention-based Neural Networks against adversarial examples by post-processing an adversarially trained model. Our technique, MeanSparse, cascades the activation functions of a trained model with novel operators that sparsify mean-centered feature vectors. This is equivalent to reducing feature variations around the mean, and we show that such reduced variations merely affect the model’s utility, yet they strongly attenuate the adversarial perturbations and decrease the attacker’s success rate. Our experiments show that, when applied to the top models in the RobustBench leaderboard, MeanSparse achieves a new robustness record of 75.28% (from 73.71%), 44.78% (from 42.67%) and 62.12% (from 59.56%) on CIFAR-10, CIFAR-100 and ImageNet, respectively, in terms of AutoAttack accuracy. Code is available at this https URL

Submission history

From: Mohammadreza Teymoorianfard [view email]
[v1]
Sun, 9 Jun 2024 22:14:55 UTC (2,541 KB)
[v2]
Wed, 2 Oct 2024 18:01:07 UTC (3,017 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.