Attention Probe: Vision Transformer Distillation in the Wild

Overview

Attention Probe: Vision Transformer Distillation in the Wild

License: MIT

Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang
In ICASSP 2022

This code is the Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Overview

  • We propose the concept of Attention Probe, a special section of the attention map to utilize a large amount of unlabeled data in the wild to complete the vision transformer data-free distillation task. Instead of generating images from the teacher network with a series of priori, images most relevant to the given pre-trained network and tasks will be identified from a large unlabeled dataset (e.g., Flickr) to conduct the knowledge distillation task.
  • We propose a simple yet efficient distillation algorithm, called probe distillation, to distill the student model using intermediate features of the teacher model, which is based on the Attention Probe.

Prerequisite

We use Pytorch 1.7.1, and CUDA 11.0. You can install them with

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

It should also be applicable to other Pytorch and CUDA versions.

Usage

Data Preparation

First, you need to modify the storage format of the cifar-10/100 and tinyimagenet dataset to the style of ImageNet, etc. CIFAR 10 run:

python process_cifar10.py

CIFAR 100 run:

python process_cifar100.py

Tiny-ImageNet run:

python process_tinyimagenet.py
python process_move_file.py

The dataset dir should have the following structure:

dir/
  train/
    ...
  val/
    n01440764/
      ILSVRC2012_val_00000293.JPEG
      ...
    ...

Train a normal teacher network

For this step you need to train normal teacher transformer models for selecting valuable data from the wild. We train the teacher model based on the timm PyTorch library:

timm

Our pretrained teacher models (CIFAR-10, CIFAR-100, ImageNet, Tiny-ImageNet, MNIST) can be downloaded from here:

Pretrained teacher models

Select valuable data from the wild

Then, you can use the Attention Probe method to select valuable data in the wild dataset.

To select valuable data on the CIFAR-10 dataset

bash training.sh
(CIFAR 10 run: CUDA_VISIBLE_DEVICES=0 python DFND_DeiT-train.py --dataset cifar10 --data_cifar $root_cifar10 --data_imagenet $root_wild --num_select 650000 --teacher_dir $teacher_cifar10 --selected_file $selected_cifar10 --output_dir $output_student_cifar10 --nb_classes 10 --lr_S 7.5e-4 --attnprobe_sel --attnprobe_dist )

(CIFAR 100 run: CUDA_VISIBLE_DEVICES=0 python DFND_DeiT-train.py --dataset cifar10 --data_cifar $root_cifar10 --data_imagenet $root_wild --num_select 650000 --teacher_dir $teacher_cifar10 --selected_file $selected_cifar10 --output_dir $output_student_cifar10 --nb_classes 10 --lr_S 7.5e-4 --attnprobe_sel --attnprobe_dist )

After you will get "class_weights.pth, pred_out.pth, value_blk3.pth, value_blk7.pth, value_out.pth" in '/selected/cifar10/' or '/selected/cifar100/' directory, you have already obtained the selected data.

Probe Knowledge Distillation for Student networks

Then you can distill the student model using intermediate features of the teacher model based on the selected data.

bash training.sh
(CIFAR 10 run: CUDA_VISIBLE_DEVICES=0 python DFND_DeiT-train.py --dataset cifar100 --data_cifar $root_cifar100 --data_imagenet $root_wild --num_select 650000 --teacher_dir $teacher_cifar100 --selected_file $selected_cifar100 --output_dir $output_student_cifar100 --nb_classes 100 --lr_S 8.5e-4 --attnprobe_sel --attnprobe_dist)

(CIFAR 100 run: CUDA_VISIBLE_DEVICES=0,1,2,3 python DFND_DeiT-train.py --dataset cifar100 --data_cifar $root_cifar100 --data_imagenet $root_wild --num_select 650000 --teacher_dir $teacher_cifar100 --selected_file $selected_cifar100 --output_dir $output_student_cifar100 --nb_classes 100 --lr_S 8.5e-4 --attnprobe_sel --attnprobe_dist)

you will get the student transformer model in '/output/cifar10/student/' or '/output/cifar100/student/' directory.

Our distilled student models (CIFAR-10, CIFAR-100, ImageNet, Tiny-ImageNet, MNIST) can be downloaded from here: Distilled student models

Results

Citation

@inproceedings{
wang2022attention,
title={Attention Probe: Vision Transformer Distillation in the Wild},
author={Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang},
booktitle={International Conference on Acoustics, Speech and Signal Processing},
year={2022},
url={https://2022.ieeeicassp.org/}
}

Acknowledgement

Owner
Wang jiahao
CVer,AutoML,NAS,Model Compression
Wang jiahao
wlad 2 Dec 19, 2022
Official Code for "Non-deep Networks"

Non-deep Networks arXiv:2110.07641 Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun Overview: Depth is the hallmark of DNNs. But more depth m

Ankit Goyal 567 Dec 12, 2022
Harmonic Memory Networks for Graph Completion

HMemNetworks Code and documentation for Harmonic Memory Networks, a series of models for compositionally assembling representations of graph elements

mlalisse 0 Oct 27, 2021
Inference pipeline for our participation in the FeTA challenge 2021.

feta-inference Inference pipeline for our participation in the FeTA challenge 2021. Team name: TRABIT Installation Download the two folders in https:/

Lucas Fidon 2 Apr 13, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
Super Resolution for images using deep learning.

Neural Enhance Example #1 — Old Station: view comparison in 24-bit HD, original photo CC-BY-SA @siv-athens. As seen on TV! What if you could increase

Alex J. Champandard 11.7k Dec 29, 2022
A self-supervised learning framework for audio-visual speech

AV-HuBERT (Audio-Visual Hidden Unit BERT) Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Robust Self-Supervised A

Meta Research 431 Jan 07, 2023
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
Behavioral "black-box" testing for recommender systems

RecList RecList Free software: MIT license Documentation: https://reclist.readthedocs.io. Overview RecList is an open source library providing behavio

Jacopo Tagliabue 375 Dec 30, 2022
DI-smartcross - Decision Intelligence Platform for Traffic Crossing Signal Control

DI-smartcross DI-smartcross - Decision Intelligence Platform for Traffic Crossin

OpenDILab 213 Jan 02, 2023
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 828 Dec 28, 2022
Solution to the Weather4cast 2021 challenge

This code was used for the entry by the team "antfugue" for the Weather4cast 2021 Challenge. Below, you can find the instructions for generating predi

Jussi Leinonen 13 Jan 03, 2023
Code for the paper: Sketch Your Own GAN

Sketch Your Own GAN Project | Paper | Youtube Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to match the in

677 Dec 28, 2022
PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Introduction This repo contains the official PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. Up

133 Dec 29, 2022
A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

International Business Machines 138 Dec 19, 2022
Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Few-shot Image Generation via Cross-domain Correspondence Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zh

Utkarsh Ojha 251 Dec 11, 2022
Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

GSAN Introduction Code for paper GSAN: Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving, wh

YE Luyao 6 Oct 27, 2022
[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

EIP: Elastic Interaction of Particles Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception", in ACMMM (Oral) 2021. By Yikai

Yikai Wang 37 Dec 20, 2022
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022