[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Last update: Nov 26, 2022

Overview

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Codes for this paper: [CVPR 2022] The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy.

Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang.

Overview

Vision transformers (ViTs) have gained increasing popularity as they are commonly believed to own higher modeling capacity and representation flexibility, than traditional convolutional networks. However, it is questionable whether such potential has been fully unleashed in practice, as the learned ViTs often suffer from over-smoothening, yielding likely redundant models.

Recent works made preliminary attempts to identify and alleviate such redundancy, e.g., via regularizing embedding similarity or re-injecting convolution-like structures. However, a “head-to-toe assessment” regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field.

This paper, for the first time, systematically studies the ubiquitous existence of redundancy at all three levels: patch embedding, attention map, and weight space. In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.

Extensive experiments on ImageNet with a number of ViT backbones validate the effectiveness of our proposals, largely eliminating the observed ViT redundancy and significantly boosting the model generalization. For example, our diversified DeiT obtains 0.70% ∼1.76% accuracy boosts on ImageNet with highly reduced similarity.

Prerequisites

Install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch torchvision
pip install timm==0.3.2

Training on ImageNet

./script/run_deit_small_diverse.sh [data/imagenet] (Deit-Small-12layers)
./script/run_deit_small_24layer_diverse.sh [data/imagenet] (Deit-Small-24layers)

Citation

TBD

Acknowledgement

https://github.com/facebookresearch/deit

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Related tags

Overview

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Overview

Prerequisites

Training on ImageNet

Citation

Acknowledgement

Owner

VITA

MonoScene: Monocular 3D Semantic Scene Completion

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

本步态识别系统主要基于GaitSet模型进行实现

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Synthesize photos from PhotoDNA using machine learning 🌱

Improving Compound Activity Classification via Deep Transfer and Representation Learning

realsense d400 -> jpg + csv

The repository contains source code and models to use PixelNet architecture used for various pixel-level tasks. More details can be accessed at .

Gradient representations in ReLU networks as similarity functions

Code from PropMix, accepted at BMVC'21

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Procedural 3D data generation pipeline for architecture

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Related tags

Overview

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Overview

Prerequisites

Training on ImageNet

Citation

Acknowledgement

Owner

VITA

MonoScene: Monocular 3D Semantic Scene Completion

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

本步态识别系统主要基于GaitSet模型进行实现

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Synthesize photos from PhotoDNA using machine learning 🌱

Improving Compound Activity Classification via Deep Transfer and Representation Learning

realsense d400 -> jpg + csv

The repository contains source code and models to use PixelNet architecture used for various pixel-level tasks. More details can be accessed at .

Gradient representations in ReLU networks as similarity functions

Code from PropMix, accepted at BMVC'21

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Procedural 3D data generation pipeline for architecture

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.