A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Deep Learningdabs
Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

This repository contains the code for DABS, a benchmark for domain-agnostic self-supervised learning algorithms. The basic components of the benchmark can be found in datasets, encoders, and algorithms. Training is implemented with the PyTorch Lightning framework, logging with Weights and Biases, and configuration management with Hydra.

Usage

We provide support for Python >= 3.7. Install requirements with

python -m pip install -r requirements.txt

For instructions on how to install PyTorch versions compatible with your CUDA versions, see pytorch.org.

Datasets

We provide a set of dataset implementations (in src/datasets) from image, text, speech, sensor, medical imaging, and image-text domains. Preprocessing operations on these datasets are minimal and hard-coded as simple resizing (i.e. of images) and truncations (i.e. of text, audio). These should not be changed so as to maintain fair comparisons across other users of the benchmark.

See conf/datasets/*.yaml for all dataset configs, including the loss, metrics, and batch size used for each dataset.

Almost all datasets will download automatically when the dataset class is instantiated. The exceptions are the CheXpert, ImageNet, and CU Birds datasets, where manual registration or download is required. See the respective dataset files for specific instructions.

Pretraining Dataset (unlabeled) Transfer Dataset (labeled)
CIFAR10 Aircraft, CIFAR10, CU Birds, DTD, Traffic Sign, VGG Flower
PAMAP2 PAMAP2
MSCOCO MSCOCO (mismatched detection), VQA (Binary classification)
Wikitext-103 GLUE (10 Tasks)
mC4 PAWS-X (7 Tasks)
CheXpert CheXpert (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), ChestX-ray8 (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax)
LibriSpeech Audio MNIST, Fluent Speech (Action, Object, Location), Google Speech Commands, LibriSpeech, VoxCeleb1

Pretraining

During the pretraining phase, self-supervised encoders are trained to learn good representations from unlabeled data. We currently support seven datasets for pretraining, one for each domain: MS COCO, ImageNet, CheXpert, PAMAP2, mC4, WikiText-103, and LibriSpeech. If the pretraining dataset has associated labels, an online linear evaluator is jointly trained with the encoder to provide a heuristic of transfer performance.

Run pretraining with commands like

python pretrain.py exp.name=<experiment-name> dataset=<dataset> algorithm=<algorithm>

Each dataset and encoder has its own config file, so to train a Transformer on the CheXpert dataset with the e-Mix algorithm, run

python pretrain.py exp.name=emix-chexpert encoder=transformer dataset=chexpert algorithm=emix

See conf/pretrain.yaml for all pretraining configuration fields.

For more information on the datasets, encoders, and algorithms, see the following section.

Pretraining Dataset Modality Label type (unused) Input Type
CIFAR10 Natural images Single label 2d
PAMAP2 Sensor Single label 2d
MSCOCO Captioned images Single label 2d +
tokens
WikiText-103 English Text No label tokens
mC4 Multilingual Text No label tokens
CheXpert Medical images Multi label 2d
LibriSpeech Speech No label 2d

Transfer Learning

After pretraining, a small linear classifier is trained on top of the frozen encoder. Run transfer learning from a randomly initialized encoder with

python transfer.py exp.name=<experiment-name> dataset=<dataset> ckpt=null 

See conf/transfer.yaml for all transfer learning configuration fields and optionally replace null with the path to your pretrained encoder checkpoint.

Dataset Modality Label type Evaluation metric Input Type
Aircraft Natural images Single label Accuracy 2d
CU Birds Natural images Single label Accuracy 2d
DTD Natural images Single label Accuracy 2d
Traffic Sign Natural images Single label Accuracy 2d
VGG Flower Natural images Single label Accuracy 2d
Pamap2 Sensor Single label Accuracy 2d
MS COCO Captioned images Binary label Accuracy 2d +
tokens
VQA Captioned images Binary label Accuracy 2d +
tokens
CheXpert Medical images Multi label AUROC 2d
ChestX-ray8 Medical images Multi label AUROC 2d
PAWS-X Multilingual Text Binary label Accuracy tokens
COLA English Text Binary label Pearson correlation tokens
MNLI Matched English Text Single label Accuracy tokens
MNLI Mismatched English Text Single label Accuracy tokens
MRPC English Text Binary label Accuracy tokens
QNLI English Text Binary label Accuracy tokens
QQP English Text Binary label Accuracy tokens
RTE English Text Binary label Accuracy tokens
SST2 English Text Binary label Accuracy tokens
STSB English Text Regression Spearman correlation tokens
WNLI English Text Binary label Accuracy tokens
Audio MNIST Speech Single label Accuracy 2d
Fluent Speech Speech Single label Accuracy 2d
Google Speech Commands Speech Single label Accuracy 2d
LibriSpeech Speech Single label Accuracy 2d
VoxCeleb1 Speech Single label Accuracy 2d

Encoders

A domain-agnostic SSL method should have an encoder which remains as constant as possible across domains. We provide a general transformer encoder baseline (in src/encoders). The transformer operates on a sequence of vectors that are produced by a small set of embedding modules (e.g. patch or token embeddings).

Pretraining algorithms

The pretraining algorithm is the framework and objective that the encoder is trained with. Examples of domain-specific algorithms include SimCLR, BYOL, and MoCo, but these are not domain-agnostic methods as they depend on vision-specific augmentations. We provide our own domain-agnostic implementations of recent algorithms, including e-mix (a generalization of i-mix) and Shuffled Embedding Detection (ShED; a generalization of ELECTRA), which randomly permutes a subset of the input embeddings and trains the model to identify the permuted embeddings.

Results

Below are results for algorithms trained on each dataset in DABS. The baseline performance is obtained via a randomly initialized encoder.

Pretrain Dataset Transfer Dataset Encoder Baseline Performance e-mix Performance ShED Performance
ImageNet CIFAR10 Transformer 24.20% 39.43% 39.63%
ImageNet CU Birds Transformer 1.62% 3.86% 2.95%
ImageNet VGG Flowers Transformer 9.03% 25.96% 13.03%
ImageNet DTD Transformer 7.39% 8.83% 18.35%
ImageNet Traffic Sign Transformer 14.33% 65.07% 27.51%
ImageNet Aircraft Transformer 2.70% 10.15% 5.60%
PAMAP2 PAMAP2 Transformer 69.81% 79.48% 88.69%
MSCOCO VQA Transformer 57.50% 48.90% 54.30%
CheXpert CheXpert Transformer 68.14% 72.40% 72.40%
CheXpert ChestX-ray8 Transformer 57.00% 63.00% 63.70%
Wikitext-103 GLUE (average) Transformer 42.29% 44.08% 48.37%
mC4 PAWS-X (average) Transformer 58.11% 56.16% 59.91%
LibriSpeech Audio MNIST Transformer 33.13% 80.35% 67.33%
LibriSpeech Fluent Locations Transformer 62.09% 60.93% 60.24%
LibriSpeech Fluent Actions Transformer 26.15% 29.87% 30.53%
LibriSpeech Fluent Objects Transformer 30.13% 39.89% 39.36%
LibriSpeech Google Speech Commands Transformer 4.87% 19.22% 20.73%
LibriSpeech LibriSpeech Transformer 17.12% 60.18% 34.77%
LibriSpeech VoxCeleb1 Transformer 0.59% 2.43% 2.81%
Owner
Alex Tamkin
PhD at @stanfordnlp
Alex Tamkin
This repository lets you interact with Lean through a REPL.

lean-gym This repository lets you interact with Lean through a REPL. See Formal Mathematics Statement Curriculum Learning for a presentation of lean-g

OpenAI 87 Dec 28, 2022
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

#NeuralTalk Warning: Deprecated. Hi there, this code is now quite old and inefficient, and now deprecated. I am leaving it on Github for educational p

Andrej 5.3k Jan 07, 2023
Pytorch implementation of AREL

Status: Archive (code is provided as-is, no updates expected) Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement

8 Nov 25, 2022
[ECE NTUA] 👁 Computer Vision - Lab Projects & Theoretical Problem Sets (2020-2021)

Computer Vision - NTUA (2020-2021) This repository hosts the lab projects and theoretical problem sets of the Computer Vision course held by ECE NTUA

Dimitris Dimos 6 Jul 21, 2022
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021) An efficient PyTorch library for Point Cloud Completion.

Microsoft 119 Jan 02, 2023
First-Order Probabilistic Programming Language

FOPPL: A First-Order Probabilistic Programming Language This is an implementation of FOPPL, an S-expression based probabilistic programming language d

Renato Costa 23 Dec 20, 2022
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

Google Research 876 Dec 17, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Torchreid: Deep learning person re-identification in PyTorch.

Torchreid Torchreid is a library for deep-learning person re-identification, written in PyTorch. It features: multi-GPU training support both image- a

Kaiyang 3.7k Jan 05, 2023
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 510 Jan 02, 2023
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
Code for approximate graph reduction techniques for cardinality-based DSFM, from paper

SparseCard Code for approximate graph reduction techniques for cardinality-based DSFM, from paper "Approximate Decomposable Submodular Function Minimi

Nate Veldt 1 Nov 25, 2022
[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning Project Page | Paper | Supplemental material #1 | Supplement

KAIST VCLAB 49 Nov 24, 2022
A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

Hamza Sayih 1 Oct 30, 2021
Command-line tool for downloading and extending the RedCaps dataset.

RedCaps Downloader This repository provides the official command-line tool for downloading and extending the RedCaps dataset. Users can seamlessly dow

RedCaps dataset 33 Dec 14, 2022
A simple code to perform canny edge contrast detection on images.

CECED-Canny-Edge-Contrast-Enhanced-Detection A simple code to perform canny edge contrast detection on images. A simple code to process images using c

Happy N. Monday 3 Feb 15, 2022
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022