Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

Overview

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation

Source code for the paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".
Paper accepted at British Machine Vision Conference (BMVC), 2021

Overview

We present a simple framework to improve performance of regression based knowledge distillation from self-supervised teacher networks. The teacher is trained using a standard self-supervised learning (SSL) technique. The student network is then trained to directly regress the teacher features (using MSE loss on normalized features). Importantly, the student architecture contains an additional multi-layer perceptron (MLP) head atop the CNN backbone during the distillation (training) stage. A deeper architecture provides the student higher capacity to predict the teacher representations. This additional MLP head can be removed during inference without hurting downstream performance. This is especially surprising since only the output of the MLP is trained to mimic the teacher and the backbone CNN features have a high MSE loss with the teacher features. This observation allows us to obtain better student models by using deeper models during distillation without altering the inference architecture. The train and test stage architectures are shown in the figure below.

Requirements

All our experiments use the PyTorch library. We recommend installing the following package versions:

  • python=3.7.6
  • pytorch=1.4
  • torchvision=0.5.0
  • faiss-gpu=1.6.1 (required for k-NN evaluation alone)

Instructions for PyTorch installation can be found here. GPU version of the FAISS package is necessary for k-NN evaluation of trained models. It can be installed using the following command:

pip install faiss-gpu

Dataset

We use the ImageNet-1k dataset in our experiments. Download and prepare the dataset using the PyTorch ImageNet training example code. The dataset path needs to be set in the bash scripts used for training and evaluation.

Training

Distillation can be performed by running the following command:

bash run.sh

Training with ResNet-50 teacher and ResNet-18 student requires nearly 2.5 days on 4 2080ti GPUs (~26m/epoch). The defualt hyperparameters values are set to ones used in the paper. Modify the teacher and student architectures as necessary. Set the approapriate paths for the ImageNet dataset root and the experiment root. The current code will generate a directory named exp_dir containing checkpoints and logs sub-directories.

Evaluation

Set the experiment name and checkpoint epoch in the evaluation bash scripts. The trained checkpoints are assumed to be stored as exp_dir/checkpoints/ckpt_epoch_<num>.pth. Edit the weights argument to load model parameters from a custom checkpoint.

k-NN Evaluation

k-NN evaluation requires FAISS-GPU package installation. We evaluate the performance of the CNN backbone features. Run k-NN evaluation using:

bash knn_eval.sh

The image features and results for k-NN (k=1 and 20) evaluation are stored in exp_dir/features/ path.

Linear Evaluation

Here, we train a single linear layer atop the CNN backbone using an SGD optimizer for 40 epochs. The evaluation can be performed using the following code:

bash lin_eval.sh

The evaluation results are stored in exp_dir/linear/ path. Set the use_cache argument in the bash script to use cached features for evaluation. Using this argument will result in a single round of feature calculation for caching and 40 epochs of linear layer training using the cached features. While it usually results in slightly reduced performance, it can be used for faster evaluation of intermediate checkpoints.

Pretrained Models

To evaluate the pretrained models, create an experiment root directory exp_dir and place the checkpoint in exp_dir/checkpoints/. Set the exp argument in the evaluation bash scripts to perform k-NN and linear evaluation. We provide the pretrained teacher (obtained using the officially shared checkpoints for the corresponding SSL teacher) and our distilled student model weights. We use cached features of the teacher in some of our experiments for faster training.

Teacher Student 1-NN Linear
MoCo-v2 ResNet-50 MobileNet-v2 55.5 69.1
MoCo-v2 ResNet-50 ResNet-18 54.8 65.1
SimCLR ResNet-50x4 ResNet-50 (cached) 60.3 74.2
BYOL ResNet-50 ResNet-18 (cached) 56.7 66.8
SwAV ResNet-50 (cached) ResNet-18 54.0 65.8

TODO

  • Add code for transfer learning evaluation
  • Reformat evaluation codes
  • Add code to evaluate models at different stages of CNN backbone and MLP head

Citation

If you make use of the code, please cite the following work:

@inproceedings{navaneet2021simreg,
 author = {Navaneet, K L and Koohpayegani, Soroush Abbasi and Tejankar, Ajinkya and Pirsiavash, Hamed},
 booktitle = {British Machine Vision Conference (BMVC)},
 title = {SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation},
 year = {2021}
}

License

This project is under the MIT license.

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
ATAC: Adversarially Trained Actor Critic

ATAC: Adversarially Trained Actor Critic Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan

Microsoft 41 Dec 08, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022
McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

DiffuseAnimals: Reaction-Diffusion Models for the Generation of Biological Patterns Introduction Reaction-diffusion equations can be utilized in order

Austin Szuminsky 2 Mar 07, 2022
Tools for computational pathology

A toolkit for computational pathology and machine learning. View documentation Please cite our paper Installation There are several ways to install Pa

254 Dec 12, 2022
Try out deep learning models online on Google Colab

Try out deep learning models online on Google Colab

Erdene-Ochir Tuguldur 1.5k Dec 27, 2022
The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

HCQ_Tweet_Dataset: FREE to Download. Keywords: HCQ, hydroxychloroquine, tweet, twitter, COVID-19 This dataset is associated with the paper "Understand

2 Mar 16, 2022
CPU inference engine that delivers unprecedented performance for sparse models

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory b

Neural Magic 1.2k Jan 09, 2023
Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

39 Dec 24, 2022
From Perceptron model to Deep Neural Network from scratch in Python.

Neural-Network-Basics Aim of this Repository: From Perceptron model to Deep Neural Network (from scratch) in Python. ** Currently working on a basic N

Aditya Kahol 1 Jan 14, 2022
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction Official Pytorch implementation of NeurIPS'21 paper "Motif-based Graph Se

zaixi 71 Dec 20, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

137 Dec 15, 2022
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

Google Interns 50 Dec 21, 2022
Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

Shaping Visual Representations with Attributes for Few-Shot Learning This code implements the Shaping Visual Representations with Attributes for Few-S

chx_nju 9 Sep 01, 2022
This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

Credit Card Fraud Detection Introduction Online transactions have become a crucial part of any business over the years. Many of those transactions use

Jonathan Hasbani 0 Jan 20, 2022