PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Overview

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Created by Prarthana Bhattacharyya.

Disclaimer: This is not an official product and is meant to be a proof-of-concept and for academic/educational use only.

This repository contains the PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime, to be presented at ICASSP-2022.

Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable, semantically meaningful and robust embeddings.


Results

  • SSL-ViT + few-shot image classification:
  • Qualitative analysis for base-classes chosen by supervised CNN and SSL-ViT for few-shot distribution calibration:
  • SSL-ViT + zero-shot image retrieval:

Pretraining Self-Supervised ViT

  • Run DINO with ViT-small network on a single node with 4 GPUs for 100 epochs with the following command.
cd dino/
python -m torch.distributed.launch --nproc_per_node=4 main_dino.py --arch vit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
  • For mini-ImageNet pretraining, we use the classes listed in: ssl-vit-fewshot/data/ImageNetSSLTrainingSplit_mini.txt For tiered-ImageNet pretraining, we use the classes listed in: ssl-vit-fewshot/data/ImageNetSSLTrainingSplit_tiered.txt
  • For CUB-200, Cars-196 and SOP, we use the pretrained model from:
import torch
vits16 = torch.hub.load('facebookresearch/dino:main', 'dino_vits16')

Visual Representation Learning with Self-Supervised ViT for Low-Label High-Data Regime

Dataset Preparation

Please follow the instruction in FRN for few-shot image classification and RevisitDML for zero-shot image retrieval to download the datasets and put the corresponding datasets in ssl-vit-fewshot/data and DIML/data folder.

Training and Evaluation for few-shot image classification

  • The first step is to extract features for base and novel classes using the pretrained SSL-ViT.
  • get_dino_miniimagenet_feats.ipynb extracts SSL-ViT features for the base and novel classes.
  • Change the hyper-parameter data_path to use CUB or tiered-ImageNet.
  • The SSL-ViT checkpoints for the various datasets are provided below (Note: this has only been trained without labels). We also provide the extracted features which need to be stored in ssl-vit-fewshot/dino_features_data/.
arch dataset download extracted-train extracted-test
ViT-S/16 mini-ImageNet mini_imagenet_checkpoint.pth train.p test.p
ViT-S/16 tiered-ImageNet tiered_imagenet_checkpoint.pth train.p test.p
ViT-S/16 CUB cub_checkpoint.pth train.p test.p
  • For n-way-k-shot evaluation, we provide miniimagenet_evaluate_dinoDC.ipynb.

Training and Evaluation for zero-shot image retrieval

  • To train the baseline CNN models, run the scripts in DIML/scripts/baselines. The checkpoints are saved in Training_Results folder. For example:
cd DIML/
CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh
  • To train the supervised ViT and self-supervised ViT:
cp -r ssl-vit-retrieval/architectures/* DIML/ssl-vit-retrieval/architectures/
CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh --arch vits
CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh --arch dino
  • To test the models, first edit the checkpoint paths in test_diml.py, then run
CUDA_VISIBLE_DEVICES=0 ./scripts/diml/test_diml.sh cub200
dataset Loss SSL-ViT-download
CUB Margin cub_ssl-vit-margin.pth
CUB Proxy-NCA cub_ssl-vit-proxynca.pth
CUB Multi-Similarity cub_ssl-vit-ms.pth
Cars-196 Margin cars_ssl-vit-margin.pth
Cars-196 Proxy-NCA cars_ssl-vit-proxynca.pth
Cars-196 Multi-Similarity cars_ssl-vit-ms.pth

Acknowledgement

The code is based on:

Owner
Prarthana Bhattacharyya
Ph.D. Candidate @WISELab-UWaterloo
Prarthana Bhattacharyya
Code for the CVPR2021 workshop paper "Noise Conditional Flow Model for Learning the Super-Resolution Space"

NCSR: Noise Conditional Flow Model for Learning the Super-Resolution Space Official NCSR training PyTorch Code for the CVPR2021 workshop paper "Noise

57 Oct 03, 2022
Advancing mathematics by guiding human intuition with AI

Advancing mathematics by guiding human intuition with AI This repo contains two colab notebooks which accompany the paper, available online at https:/

DeepMind 315 Dec 26, 2022
Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network This is the official implementation of

azad 2 Jul 09, 2022
Discovering Interpretable GAN Controls [NeurIPS 2020]

GANSpace: Discovering Interpretable GAN Controls Figure 1: Sequences of image edits performed using control discovered with our method, applied to thr

Erik Härkönen 1.7k Jan 03, 2023
Vector Quantized Diffusion Model for Text-to-Image Synthesis

Vector Quantized Diffusion Model for Text-to-Image Synthesis Due to company policy, I have to set microsoft/VQ-Diffusion to private for now, so I prov

Shuyang Gu 294 Jan 05, 2023
Autonomous Movement from Simultaneous Localization and Mapping

Autonomous Movement from Simultaneous Localization and Mapping About us Built by a group of Clarkson University students with the help from Professor

14 Nov 07, 2022
Benchmarking the robustness of Spatial-Temporal Models

Benchmarking the robustness of Spatial-Temporal Models This repositery contains the code for the paper Benchmarking the Robustness of Spatial-Temporal

Yi Chenyu Ian 15 Dec 16, 2022
Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

Evan Walters 1 Nov 09, 2021
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation This repository contains the official implementation of our paper: Self-su

Visual Inference Lab @TU Darmstadt 132 Dec 21, 2022
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

disclaimer: this code is modified from pytorch-tutorial Image classification with synthetic gradient in Pytorch I implement the Decoupled Neural Inter

Andrew 114 Dec 22, 2022
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks This repository contains a TensorFlow implementation of "

Jingwei Zheng 5 Jan 08, 2023
DeepStochlog Package For Python

DeepStochLog Installation Installing SWI Prolog DeepStochLog requires SWI Prolog to run. Run the following commands to install: sudo apt-add-repositor

KU Leuven Machine Learning Research Group 17 Dec 23, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 05, 2023
Code for "Localization with Sampling-Argmax", NeurIPS 2021

Localization with Sampling-Argmax [Paper] [arXiv] [Project Page] Localization with Sampling-Argmax Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-

JeffLi 71 Dec 17, 2022
RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation RL-GAN is an official implementation of the paper: T

42 Nov 10, 2022