Adversarial Training for Free!

Posted Feb 15, 2025

5 min read

NeurIPS 2019. Paper Github Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein.

🍒 Key Takeaways

하나의 backward pass에서 NN parameter 뿐만 아니라 input image에 대한 loss의 gradient도 계산하여 cost 없이 Adversarial Examples을 생성하였다.
동일한 input image에 대해 여러 번 update하기 위해, 동일한 minibatch로 연속 $m$ 번 훈련하도록 하였고, 전체 훈련 반복 횟수를 유지하기 위해 전체 epoch 수를 $m$ 으로 나누었다.

1. Introduction

이 논문은 Adversarial Examples Generation 분야를 다룬다. Adversarial Examples로 Neural Network를 학습시키는 것을 Adversarial Training이라 한다.

contributions

eliminates the overhead cost (기존 연구: high cost)
updating model parameters에 쓰이는 gradient information을 image를 변형시킬 때에 재사용한다.
기존 방법과 비슷하거나 약간 더 높은 성능을 보인다.

기존 연구는 adversarial examples를 생성하는 cost가 너무 크다. gradient computation은 NW parameter 업데이트할 때도 필요하지만, 각 SGD iteration에서 adversarial example generation할 때도 여러 번 쓰인다. 따라서 후자에 쓰이는 # of gradient steps에 따라 slowdown factor가 결정되며, non-robust model보다 3-30배 더 많은 시간이 소요된다. Adversarial training · defense 기법들은 너무 time-consuming해 large-scale problems에 적용하기 어렵다.

Non-targeted adversarial examples

Adversarial examples에는 두 가지 종류가 있다. 이중 이 논문은 generation과 evaluation 모두에서 non-targeted examples을 사용하였다.

non-targeted: image를 특정 class로 이동시킨다.
targeted: natural class를 벗어나게 한다.

기존 유명한 non-targeted generation method는 다음과 같다.

Fast Gradient Sign Method (FGSM)
- 한 번의 iteration으로 gradients 부호를 사용한다.
- non-iterative attack
Basic Iterative Method(BIM): FGSM의 반복 버전
PGD(Projected Gradient Descent) 공격
- a variant of BIM with uniform random noise as initialization
- # of iterations $K$ 가 중요
- In each iteration, 각 이미지에 대한 loss의 gradient를 계산하기 위해 a complete forward and backward pass가 필요하다.