Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Last update: Oct 24, 2022

Overview

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv

Abstract

Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations.

Setup

Set Paths

Set the paths in ./config.py according to your system and environment.

Download ViT Checkpoints

Run bash ./download_checkpoints.sh

NeurIPS dataset

We are providing the NeurIPS adversarial challenge dataset together with this repository. The images are stored in ./images together with the data sheet in ./images.csv

Evaluate Models

As a sanity check you can evaluate the models on the NeurIPS dataset and check if the numbers match Table 1 of the paper with bash ./experiments/eval_models.sh

White-box attack

For the white-box attacks you can run the corresponding script.

PGD attack

bash ./experiments/attack_pgd.sh

FGSM attack

bash ./experiments/attack_fgsm.sh

C&W

bash ./experiments/attack_cw.sh

DeepFool

bash ./experiments/attack_deepfool.sh

Black-box attack

Query-based
Transfer-based

For the black-box attacks you can run the corresponding script.

Transferability with I-FGSM

bash ./experiments/transferability.sh

Universal Adversarial Attack

Run bash ./experiments/attack_uap.sh

Docker

We provide a Dockerfile to get better reproducibility of the results presented in the paper. Have a look in the docker folder.

Credits

We would like to credit the following resources, which helped tremendously in our development-process.

Citation

@article{benz2021adversarial,
  title={Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs},
  author={Benz, Philipp and Ham, Soomin and Zhang, Chaoning and Karjauv, Adil and Kweon, In So},
  journal={arXiv preprint arXiv:2110.02797},
  year={2021}
}

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Related tags

Overview

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv

Abstract

Setup

Set Paths

Download ViT Checkpoints

NeurIPS dataset

Evaluate Models

White-box attack

PGD attack

FGSM attack

C&W

DeepFool

Black-box attack

Transferability with I-FGSM

Universal Adversarial Attack

Docker

Credits

Citation

Owner

Philipp Benz

DrNAS: Dirichlet Neural Architecture Search

User-friendly bulk RNAseq deconvolution using simulated annealing

[Link]mareteutral - pars tradg wth M []

The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

All the essential resources and template code needed to understand and practice data structures and algorithms in python with few small projects to demonstrate their practical application.

[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Package for working with hypernetworks in PyTorch.

Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

Distance correlation and related E-statistics in Python

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

GluonMM is a library of transformer models for computer vision and multi-modality research

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

List of papers, code and experiments using deep learning for time series forecasting