[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Overview

Are Transformers More Robust Than CNNs?

Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs?

Our implementation is based on DeiT.

Introduction

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations.

With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs.

Pretrained models

We provide both pretrained vanilla models and adversarially trained models.

Vanilla Training

Main Results

Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res50-Ori download link 76.9 3.2 57.9 8.3
Res50-Align download link 76.3 4.5 55.6 8.2
Res50-Best download link 75.7 6.3 52.3 10.8
DeiT-Small download link 76.8 12.2 48.0 13.0

Model Size

ResNets:

  • ResNets fully aligned (with DeiT's training recipe) model, denoted as res*:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18* 11.69M download link 67.83 1.92 64.14 7.92
Res50* 25.56M download link 76.28 4.53 55.62 8.17
Res101* 44.55M download link 77.97 8.84 49.19 11.60
  • ResNets best model (for Out-of-Distribution (OOD) generalization), denoted as res-best:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18-best 11.69M download link 66.81 2.03 62.65 9.45
Res50-best 25.56M download link 75.74 6.32 52.25 10.77
Res101-best 44.55M download link 77.83 11.49 47.35 13.28

DeiTs:

Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
DeiT-Mini 9.98M download link 72.89 8.19 54.68 9.88
DeiT-Small 22.05M download link 76.82 12.21 47.99 12.98

Model Distillation

Architecture Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Teacher DeiT-Small download link 76.8 12.2 48.0 13.0
Student Res50*-Distill download link 76.7 5.2 54.2 9.8
Teacher Res50* download link 76.3 4.5 55.6 8.2
Student DeiT-S-Distill download link 76.2 10.9 49.3 11.9

Adversarial Training

Pretrained Model Clean Acc PGD-100 Auto Attack
Res50-ReLU download link 66.77 32.26 26.41
Res50-GELU download link 67.38 40.27 35.51
DeiT-Small download link 66.50 40.32 35.50

Vanilla Training

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Environment

Install dependencies:

pip3 install -r requirements.txt

Training Scripts

To train a ResNet model on ImageNet run:

bash script/res.sh

To train a DeiT model on ImageNet run:

bash script/deit.sh

Generalization to Out-of-Distribution Sample

Data Preparation

Download and extract ImageNet-A, ImageNet-C, Stylized-ImageNet val images:

/path/to/datasets/
  val/
    class1/
      img1.jpeg
    class/2
      img2.jpeg

Evaluation Scripts

To evaluate pre-trained models, run:

bash script/generation_to_ood.sh

It is worth noting that for ImageNet-C evaluation, the error rate is calculated based on the Noise, Blur, Weather and Digital categories.

Adversarial Training

To perform adversarial training on ResNet run:

bash script/advres.sh

To do adversarial training on DeiT run:

bash scripts/advdeit.sh

Robustness to Adversarial Example

PGD Attack Evaluation

To evaluate the pre-trained models, run:

bash script/eval_advtraining.sh

AutoAttack Evaluation

./autoattack contains the AutoAttack public package, with a little modification to best support ImageNet evaluation.

cd autoattack/
bash autoattack.sh

Patch Attack Evaluation

Please refer to PatchAttack

Citation

If you use our code, models or wish to refer to our results, please use the following BibTex entry:

@inproceedings{bai2021transformers,
  title     = {Are Transformers More Robust Than CNNs?},
  author    = {Bai, Yutong and Mei, Jieru and Yuille, Alan and Xie, Cihang},
  booktitle = {Thirty-Fifth Conference on Neural Information Processing Systems},
  year      = {2021},
}
Owner
Yutong Bai
CS Ph.D student @ JHU, CCVL
Yutong Bai
SingleVC performs any-to-one VC, which is an important component of MediumVC project.

SingleVC performs any-to-one VC, which is an important component of MediumVC project. Here is the official implementation of the paper, MediumVC.

谷下雨 26 Dec 28, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
The Official Implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose [NIPS 2021].

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The offical PyTorch implementation of Neural View Sy

Angtian Wang 20 Oct 09, 2022
Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Industrial KNN-based Anomaly Detection ⭐ Now has streamlit support! ⭐ Run $ streamlit run streamlit_app.py This repo aims to reproduce the results of

aventau 102 Dec 26, 2022
pytorch implementation of dftd2 & dftd3

torch-dftd pytorch implementation of dftd2 [1] & dftd3 [2, 3] Install # Install from pypi pip install torch-dftd # Install from source (for developer

33 Nov 28, 2022
YOLOPのPythonでのONNX推論サンプル

YOLOP-ONNX-Video-Inference-Sample YOLOPのPythonでのONNX推論サンプルです。 ONNXモデルは、hustvl/YOLOP/weights を使用しています。 Requirement OpenCV 3.4.2 or later onnxruntime 1.

KazuhitoTakahashi 8 Sep 05, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 09, 2023
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

TechSEO Crawler Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index. Play with the r

JR Oakes 57 Nov 24, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
An implementation of an abstract algebra for music tones (pitches).

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

Open Music Kit 0 Oct 10, 2022
From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Code for CCA-SSG model proposed in the NeurIPS 2021 paper From Canonical Correlation Analysis to Self-supervised Graph Neural Networks.

Hengrui Zhang 44 Nov 27, 2022
Semantic Edge Detection with Diverse Deep Supervision

Semantic Edge Detection with Diverse Deep Supervision This repository contains the code for our IJCV paper: "Semantic Edge Detection with Diverse Deep

Yun Liu 12 Dec 31, 2022
PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Progressive Growing of GANs inference in PyTorch with CelebA training snapshot Description This is an inference sample written in PyTorch of the origi

320 Nov 21, 2022
Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Visual Transformer for Facial Emotion Recognition (FER) This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recog

Mario Sessa 8 Dec 12, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 02, 2023
DIVeR: Deterministic Integration for Volume Rendering

DIVeR: Deterministic Integration for Volume Rendering This repo contains the training and evaluation code for DIVeR. Setup python 3.8 pytorch 1.9.0 py

64 Dec 27, 2022
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
Neuralnetwork - Basic Multilayer Perceptron Neural Network for deep learning

Neural Network Just a basic Neural Network module Usage Example Importing Module

andreecy 0 Nov 01, 2022
My implementation of transformers related papers for computer vision in pytorch

vision_transformers This is my personnal repo to implement new transofrmers based and other computer vision DL models I am currenlty working without a

samsja 1 Nov 10, 2021