[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Overview

Are Transformers More Robust Than CNNs?

Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs?

Our implementation is based on DeiT.

Introduction

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations.

With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs.

Pretrained models

We provide both pretrained vanilla models and adversarially trained models.

Vanilla Training

Main Results

Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res50-Ori download link 76.9 3.2 57.9 8.3
Res50-Align download link 76.3 4.5 55.6 8.2
Res50-Best download link 75.7 6.3 52.3 10.8
DeiT-Small download link 76.8 12.2 48.0 13.0

Model Size

ResNets:

  • ResNets fully aligned (with DeiT's training recipe) model, denoted as res*:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18* 11.69M download link 67.83 1.92 64.14 7.92
Res50* 25.56M download link 76.28 4.53 55.62 8.17
Res101* 44.55M download link 77.97 8.84 49.19 11.60
  • ResNets best model (for Out-of-Distribution (OOD) generalization), denoted as res-best:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18-best 11.69M download link 66.81 2.03 62.65 9.45
Res50-best 25.56M download link 75.74 6.32 52.25 10.77
Res101-best 44.55M download link 77.83 11.49 47.35 13.28

DeiTs:

Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
DeiT-Mini 9.98M download link 72.89 8.19 54.68 9.88
DeiT-Small 22.05M download link 76.82 12.21 47.99 12.98

Model Distillation

Architecture Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Teacher DeiT-Small download link 76.8 12.2 48.0 13.0
Student Res50*-Distill download link 76.7 5.2 54.2 9.8
Teacher Res50* download link 76.3 4.5 55.6 8.2
Student DeiT-S-Distill download link 76.2 10.9 49.3 11.9

Adversarial Training

Pretrained Model Clean Acc PGD-100 Auto Attack
Res50-ReLU download link 66.77 32.26 26.41
Res50-GELU download link 67.38 40.27 35.51
DeiT-Small download link 66.50 40.32 35.50

Vanilla Training

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Environment

Install dependencies:

pip3 install -r requirements.txt

Training Scripts

To train a ResNet model on ImageNet run:

bash script/res.sh

To train a DeiT model on ImageNet run:

bash script/deit.sh

Generalization to Out-of-Distribution Sample

Data Preparation

Download and extract ImageNet-A, ImageNet-C, Stylized-ImageNet val images:

/path/to/datasets/
  val/
    class1/
      img1.jpeg
    class/2
      img2.jpeg

Evaluation Scripts

To evaluate pre-trained models, run:

bash script/generation_to_ood.sh

It is worth noting that for ImageNet-C evaluation, the error rate is calculated based on the Noise, Blur, Weather and Digital categories.

Adversarial Training

To perform adversarial training on ResNet run:

bash script/advres.sh

To do adversarial training on DeiT run:

bash scripts/advdeit.sh

Robustness to Adversarial Example

PGD Attack Evaluation

To evaluate the pre-trained models, run:

bash script/eval_advtraining.sh

AutoAttack Evaluation

./autoattack contains the AutoAttack public package, with a little modification to best support ImageNet evaluation.

cd autoattack/
bash autoattack.sh

Patch Attack Evaluation

Please refer to PatchAttack

Citation

If you use our code, models or wish to refer to our results, please use the following BibTex entry:

@inproceedings{bai2021transformers,
  title     = {Are Transformers More Robust Than CNNs?},
  author    = {Bai, Yutong and Mei, Jieru and Yuille, Alan and Xie, Cihang},
  booktitle = {Thirty-Fifth Conference on Neural Information Processing Systems},
  year      = {2021},
}
Owner
Yutong Bai
CS Ph.D student @ JHU, CCVL
Yutong Bai
Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

Ethyca 44 Dec 06, 2022
TensorFlow 2 implementation of the Yahoo Open-NSFW model

TensorFlow 2 implementation of the Yahoo Open-NSFW model

Bosco Yung 101 Jan 01, 2023
PyTorch framework for Deep Learning research and development.

Accelerated DL & RL PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentati

Catalyst-Team 29 Jul 13, 2022
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

Maxime Burchi 145 Dec 30, 2022
Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Class-Attentive Diffusion Network for Semi-Supervised Classification Official Implementation of AAAI 2021 paper Class-Attentive Diffusion Network for

Jongin Lim 7 Sep 20, 2022
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

OpenDILab 205 Dec 29, 2022
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023
Python Library for Signal/Image Data Analysis with Transport Methods

PyTransKit Python Transport Based Signal Processing Toolkit Website and documentation: https://pytranskit.readthedocs.io/ Installation The library cou

24 Dec 23, 2022
The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

The Equalization Losses for Long-tailed Object Detection and Instance Segmentation This repo is official implementation CVPR 2021 paper: Equalization

Jingru Tan 129 Dec 16, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

Oliver Hahn 1 Jan 26, 2022
Official implementation of the NeurIPS 2021 paper Online Learning Of Neural Computations From Sparse Temporal Feedback

Online Learning Of Neural Computations From Sparse Temporal Feedback This repository is the official implementation of the NeurIPS 2021 paper Online L

Lukas Braun 3 Dec 15, 2021
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

程星 87 Dec 24, 2022
A Python library created to assist programmers with complex mathematical functions

libmaths libmaths was created not only as a learning experience for me, but as a way to make mathematical models in seconds for Python users using mat

Simple 73 Oct 02, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Code for the paper "Next Generation Reservoir Computing"

Next Generation Reservoir Computing This is the code for the results and figures in our paper "Next Generation Reservoir Computing". They are written

OSU QuantInfo Lab 105 Dec 20, 2022
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

9 Dec 13, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022