Backdoor Attack through Frequency Domain
Backdoor attacks have been shown to be a serious threat against deep learning
systems such as biometric authentication and autonomous driving. An effective
backdoor attack could enforce the model misbehave under certain predefined
conditions, i.e., triggers, but behave normally otherwise. However, the
triggers of existing attacks are directly injected in the pixel space, which
tend to be detectable by existing defenses and visually identifiable at both
training and inference stages. In this paper, we propose a new backdoor attack
FTROJAN through trojaning the frequency domain. The key intuition is that
triggering perturbations in the frequency domain correspond to small pixel-wise
perturbations dispersed across the entire image, breaking the underlying
assumptions of existing defenses and making the poisoning images visually
indistinguishable from clean ones. We evaluate FTROJAN in several datasets and
tasks showing that it achieves a high attack success rate without significantly
degrading the prediction accuracy on benign inputs. Moreover, the poisoning
images are nearly invisible and retain high perceptual quality. We also
evaluate FTROJAN against state-of-the-art defenses as well as several adaptive
defenses that are designed on the frequency domain. The results show that
FTROJAN can robustly elude or significantly degenerate the performance of these
defenses.