arXiv:2408.00023v1 Announce Type: new
Abstract: Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component’s robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as textit{Transformed Input-robust RL (TIRL)}, which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: textit{(1) autoencoder-styled denoising} to reconstruct the original state and textit{(2) bounded transformations (bit-depth reduction and vector quantization (VQ))} to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple mujoco environments demonstrate that input transformation-based defenses, ie, VQ, defend against several adversaries in the state observations.
Source link
lol