[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Last update: Dec 01, 2022

Related tags

Overview

Are Transformers More Robust Than CNNs?

Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs?

Our implementation is based on DeiT.

Introduction

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations.

With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs.

Pretrained models

We provide both pretrained vanilla models and adversarially trained models.

Vanilla Training

Main Results

	Pretrained Model	ImageNet	ImageNet-A	ImageNet-C	Stylized-ImageNet
Res50-Ori	download link	76.9	3.2	57.9	8.3
Res50-Align	download link	76.3	4.5	55.6	8.2
Res50-Best	download link	75.7	6.3	52.3	10.8
DeiT-Small	download link	76.8	12.2	48.0	13.0

Model Size

ResNets:

ResNets fully aligned (with DeiT's training recipe) model, denoted as res*:

	Model Size	Pretrained Model	ImageNet	ImageNet-A	ImageNet-C	Stylized-ImageNet
Res18*	11.69M	download link	67.83	1.92	64.14	7.92
Res50*	25.56M	download link	76.28	4.53	55.62	8.17
Res101*	44.55M	download link	77.97	8.84	49.19	11.60

ResNets best model (for Out-of-Distribution (OOD) generalization), denoted as res-best:

	Model Size	Pretrained Model	ImageNet	ImageNet-A	ImageNet-C	Stylized-ImageNet
Res18-best	11.69M	download link	66.81	2.03	62.65	9.45
Res50-best	25.56M	download link	75.74	6.32	52.25	10.77
Res101-best	44.55M	download link	77.83	11.49	47.35	13.28

DeiTs:

	Model Size	Pretrained Model	ImageNet	ImageNet-A	ImageNet-C	Stylized-ImageNet
DeiT-Mini	9.98M	download link	72.89	8.19	54.68	9.88
DeiT-Small	22.05M	download link	76.82	12.21	47.99	12.98

Model Distillation

	Architecture	Pretrained Model	ImageNet	ImageNet-A	ImageNet-C	Stylized-ImageNet
Teacher	DeiT-Small	download link	76.8	12.2	48.0	13.0
Student	Res50*-Distill	download link	76.7	5.2	54.2	9.8
Teacher	Res50*	download link	76.3	4.5	55.6	8.2
Student	DeiT-S-Distill	download link	76.2	10.9	49.3	11.9

Adversarial Training

	Pretrained Model	Clean Acc	PGD-100	Auto Attack
Res50-ReLU	download link	66.77	32.26	26.41
Res50-GELU	download link	67.38	40.27	35.51
DeiT-Small	download link	66.50	40.32	35.50

Vanilla Training

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Environment

Install dependencies:

pip3 install -r requirements.txt

Training Scripts

To train a ResNet model on ImageNet run:

bash script/res.sh

To train a DeiT model on ImageNet run:

bash script/deit.sh

Generalization to Out-of-Distribution Sample

Data Preparation

Download and extract ImageNet-A, ImageNet-C, Stylized-ImageNet val images:

/path/to/datasets/
  val/
    class1/
      img1.jpeg
    class/2
      img2.jpeg

Evaluation Scripts

To evaluate pre-trained models, run:

bash script/generation_to_ood.sh

It is worth noting that for ImageNet-C evaluation, the error rate is calculated based on the Noise, Blur, Weather and Digital categories.

Adversarial Training

To perform adversarial training on ResNet run:

bash script/advres.sh

To do adversarial training on DeiT run:

bash scripts/advdeit.sh

Robustness to Adversarial Example

PGD Attack Evaluation

To evaluate the pre-trained models, run:

bash script/eval_advtraining.sh

AutoAttack Evaluation

./autoattack contains the AutoAttack public package, with a little modification to best support ImageNet evaluation.

cd autoattack/
bash autoattack.sh

Patch Attack Evaluation

Please refer to PatchAttack

Citation

If you use our code, models or wish to refer to our results, please use the following BibTex entry:

@inproceedings{bai2021transformers,
  title     = {Are Transformers More Robust Than CNNs?},
  author    = {Bai, Yutong and Mei, Jieru and Yuille, Alan and Xie, Cihang},
  booktitle = {Thirty-Fifth Conference on Neural Information Processing Systems},
  year      = {2021},
}

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Related tags

Overview

Are Transformers More Robust Than CNNs?

Introduction

Pretrained models

Vanilla Training

Main Results

Model Size

Model Distillation

Adversarial Training

Vanilla Training

Data preparation

Environment

Training Scripts

Generalization to Out-of-Distribution Sample

Data Preparation

Evaluation Scripts

Adversarial Training

Robustness to Adversarial Example

PGD Attack Evaluation

AutoAttack Evaluation

Patch Attack Evaluation

Citation

Owner

Yutong Bai

End-To-End Memory Network using Tensorflow

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Node-level Graph Regression with Deep Gaussian Process Models

A booklet on machine learning systems design with exercises

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Pytorch implementation of OCNet series and SegFix.

Face and other object detection using OpenCV and ML Yolo

A quantum game modeling of pandemic (QHack 2022)

Atomistic Line Graph Neural Network

LabelImg is a graphical image annotation tool.

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

This code finds bounding box of a single human mouth.

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

KITTI-360 Annotation Tool is a framework that developed based on python(cherrypy + jinja2 + sqlite3) as the server end and javascript + WebGL as the front end.

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection