arXiv:2407.02811v1 Announce Type: new
Abstract: Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose textit{SPLITZ}, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to textit{split} a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for textit{SPLITZ} comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. textit{SPLITZ} can exploit this heterogeneity while inheriting the scalability of randomized smoothing. We present a principled approach to train textit{SPLITZ} and provide theoretical analysis to derive certified robustness guarantees during inference. We present a comprehensive comparison of robustness-accuracy tradeoffs and show that textit{SPLITZ} consistently improves upon existing state-of-the-art approaches on MNIST and CIFAR-10 datasets. For instance, with $ell_2$ norm perturbation budget of textbf{$epsilon=1$}, textit{SPLITZ} achieves $textbf{43.2%}$ top-1 test accuracy on CIFAR-10 dataset compared to state-of-art top-1 test accuracy $textbf{39.8%}
Source link
lol