Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Overview

Official PyTorch implementation of the paper:

"Self-Supervised Relational Reasoning for Representation Learning" (2020), Patacchiola, M., and Storkey, A., Advances in Neural Information Processing Systems (NeurIPS), Spotlight (Top 3%) [arXiv]

@inproceedings{patacchiola2020self,
  title={Self-Supervised Relational Reasoning for Representation Learning},
  author={Patacchiola, Massimiliano and Storkey, Amos},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Abstract: In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly manual annotation. In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data. Training a relation head to discriminate how entities relate to themselves (intra-reasoning) and other entities (inter-reasoning), results in rich and descriptive representations in the underlying neural network backbone, which can be used in downstream tasks such as classification and image retrieval. We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones. Self-supervised relational reasoning outperforms the best competitor in all conditions by an average 14% in accuracy, and the most recent state-of-the-art model by 3%. We link the effectiveness of the method to the maximization of a Bernoulli log-likelihood, which can be considered as a proxy for maximizing the mutual information, resulting in a more efficient objective with respect to the commonly used contrastive losses.

self-supervised relational reasoning

Essential code

Here, you can find the essential code of the method with full training pipeline:

The essential code above, trains a self-supervised relation module on CIFAR-10 with a Conv4 backbone. The backbone is stored at the end of the training and can be used for other downstream tasks (e.g. classification, image retrieval). The GPU is not required for those examples. This has been tested on Ubuntu 18.04 LTS with Python 3.6 and Pytorch 1.4.

Pretrained models

  • [download][247 MB] Relational Reasoning, SlimageNet64 (160K images, 64x64 pixels), ResNet-34, trained for 300 epochs
  • [download][82 MB] Relational Reasoning, STL-10 (unlabeled split, 100K images, 96x96 pixels), ResNet-34, trained for 300 epochs
  • [download][10 MB] Relational Reasoning, CIFAR-10 (50K images, 32x32 pixels), ResNet-56, trained for 500 epochs
  • [download][10 MB] Relational Reasoning, CIFAR-100 (50K images, 32x32 pixels), ResNet-56, trained for 500 epochs

Note that, ResNet-34 has 4-hyperblocks (21 M parameters) and is larger than ResNet-56 with 3-hyperblocks (0.8 M parameters). The archives contain backbone, relation head, and optimizer parameters. Those have been saved in the internal dictionary as backbone, relation, and optimizer. To grab the backbone weights it is possible to use the standard PyTorch loader. For instance, to load the ResNet-34 pretrained on STL-10 the following script can be used:

import torch
import torchvision
my_net = torchvision.models.resnet34()
checkpoint = torch.load("relationnet_stl10_resnet34_seed_1_epoch_300.tar")
my_net.load_state_dict(checkpoint["backbone"], strict=False)

The ResNet-34 model can be loaded by using the standard Torchvision ResNet class or the resnet_large.py class in ./backbones. Likewise, the ResNet-56 models can be loaded by using the resnet_small.py class in ./backbones but it is not compatible with the standard Torchvision ResNet class, since it only has three hyperblocks while the Torchvision class has four hyperblocks. A handy class is also contained in standard.py under ./methods, this automatically load the backbone and add a linear layer on top. To load the full model (backbone + relation head) it is necessary to define a new object using the class relationnet.py and load the checkpoint by using the internal method load(file_path).

Code to reproduce the experiments

The code in this repository allows replicating the core results of our experiments. All the methods are contained in the ./methods folder. The feature extractors (backbones) are contained in the ./backbones folder. The code is modular and new methods and dataset can be easily included. Checkpoints and logs are automatically saved in ./checkpoint/METHOD_NAME/DATASET_NAME, most of the datasets are automatically downloaded and stored in ./data (SlimageNet64 and tiny-ImageNet need to be downloaded separately). The tiny-ImageNet dataset needs to be downloaded from here, then it must be unpacked and pre-processed using this script. In the paper (and appendix) we have reported the parameters for all conditions. Here is a list of the parameters used in our experiments:

Methods: relationnet (ours), simclr, deepcluster, deepinfomax, rotationnet, randomweights (lower bound), and standard (upper bound).

Datasets: cifar10, cifar100, supercifar100, stl10, slim (SlimageNet64), and tiny (tiny-ImageNet).

Backbones: conv4, resnet8, resnet32, resnet56, and resnet34 (larger with 4 hyper-blocks).

Mini-batch size: 128 for all methods, 64 for our method with K=32 (if your GPU has enough memory you can increase K).

Epochs: 200 for unsupervised training (300 for STL-10), and 100 for linear evaluation.

Seeds: 1, 2, 3 (our results are the average over these three seeds).

Memory: self-supervised methods can be expensive in terms of memory. For our method, you may have to decrease the value of K to avoid that your CPU/GPU memory gets saturated. In our experiments we managed to fit into a NVIDIA GeForce RTX 2080 a model with backbone=resnet56, mini-batch data_size=64, and augmentations K=32. Note that, depending on the number of augmentations, mini-batch size, and your particular hardware, it may take from a few seconds up to several minutes to complete a single epoch.

For training and evaluation there are three stages: 1) unsupervised training, 2) training through linear evaluation, 3) test. Those are described below.

1) Unsupervised training

Each method should be trained on the unsupervised version of the base dataset. This is managed by the file train_unsupervised.py. In the following example we train our Self-Supervised Relational method on CIFAR-10 using a Conv-4 backbone with a mini-batch of size 64 and 32 augmentations for 200 epochs:

python train_unsupervised.py --dataset="cifar10" --method="relationnet" --backbone="conv4" --seed=1 --data_size=64 --K=32 --gpu=0 --epochs=200

Note that, when using method=standard labels are used since this corresponds to the supervised upper-bound. In all other cases labels are discarded and each method is trained following its own self-supervised routine.

2) Training through linear evaluation

This procedure consists of taking the checkpoint saved at the end of the previous phase, load the backbone in memory, and replace the last linear layer with a new one. Then the last layer is trained (no training of the backbone) for 100 epochs. This procedure allows checking if useful representations have been learned in the previous stage. This phase is managed in the file train_linear_evaluation.py. An example is the following:

python train_linear_evaluation.py --dataset="cifar10" --method="relationnet" --backbone="conv4" --seed=1 --data_size=128 --gpu=0 --epochs=100 --checkpoint="./checkpoint/relationnet/cifar10/relationnet_cifar10_conv4_seed_1_epoch_200.tar"

The additional parameter --finetune=True can be added if you want to train also the backbone (fine-tune using a smaller learning rate). For the cross-domain experiments, the linear evaluation must be done on another dataset. For instance, for the condition CIFAR-10 -> CIFAR-100 the phase 1) must be done using cifar10 and phase 2) using cifar100. Similarly, for the coarse-grained experiments training in phase 1) must be done on cifar100 and in phase 2) on supercifar100 (using the 20 super-classes of CIFAR-100).

3) Test

The last stage just consists of testing the model trained during the linear evaluation on the unseen test set. Note that, at the end of the previous phase a new checkpoint is saved and this should be loaded in memory now. This phase is managed by the file test.py. An example of command is the following:

python test.py --dataset="cifar10" --backbone="conv4" --seed=1 --data_size=128 --gpu=0 --checkpoint="./checkpoint/relationnet/cifar10/relationnet_cifar10_conv4_seed_1_epoch_100_linear_evaluation.tar"

Test on cross-domain conditions should be done by selecting the appropriate dataset at test time. The rule is to use the same dataset used in phase 2). For instance, in the cross-domain condition CIFAR-100 -> CIFAR-100 the test set should be cifar100.

License

MIT License

Copyright (c) 2020 Massimiliano Patacchiola

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Owner
Massimiliano Patacchiola
Postdoc at the University of Cambridge. Likes Machine/Deep/Reinforcement Learning.
Massimiliano Patacchiola
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Where we are ? 12.27 目前和原论文仍有1%左右得差距,但已经力压很多SOTA了 ckpt__448_epoch_25.pth mIoU

zichengsaber 60 Dec 11, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
Generate Cartoon Images using Generative Adversarial Network

AvatarGAN ✨ Generate Cartoon Images using DC-GAN Deep Convolutional GAN is a generative adversarial network architecture. It uses a couple of guidelin

Aakash Jhawar 50 Dec 29, 2022
Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES) This repo contains the full NITRATES pipeline for maximum likelihood-driven discov

13 Nov 08, 2022
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022
Plug and play transformer you can find network structure and official complete code by clicking List

Plug-and-play Module Plug and play transformer you can find network structure and official complete code by clicking List The following is to quickly

8 Mar 27, 2022
Uses OpenCV and Python Code to detect a face on the screen

Simple-Face-Detection This code uses OpenCV and Python Code to detect a face on the screen. This serves as an example program. Important prerequisites

Denis Woolley (CreepyD) 1 Feb 12, 2022
A stable algorithm for GAN training

DRAGAN (Deep Regret Analytic Generative Adversarial Networks) Link to our paper - https://arxiv.org/abs/1705.07215 Pytorch implementation (thanks!) -

195 Oct 10, 2022
converts nominal survey data into a numerical value based on a dictionary lookup.

SWAP RATE Converts nominal survey data into a numerical values based on a dictionary lookup. It allows the user to switch nominal scale data from text

Jake Rhodes 1 Jan 18, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
Identify the emotion of multiple speakers in an Audio Segment

MevonAI - Speech Emotion Recognition Identify the emotion of multiple speakers in a Audio Segment Report Bug · Request Feature Try the Demo Here Table

Suyash More 110 Dec 03, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

[ICCV 2021] GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning This is the official implementation of our ICCV2021 paper GyroFlow. Our pres

MEGVII Research 36 Sep 07, 2022
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-ba

PyKale 370 Dec 27, 2022
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
Mail classification with tensorflow and MS Exchange Server (ham or spam).

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Metin Karatas 1 Sep 11, 2021
Adaptive FNO transformer - official Pytorch implementation

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers This repository contains PyTorch implementation of the Adaptive Fourier Neu

NVIDIA Research Projects 77 Dec 29, 2022
This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

Huynh Ngoc Anh 1.7k Dec 24, 2022