[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

Last update: Dec 14, 2022

Overview

SoCo

[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

By Fangyun Wei*, Yue Gao*, Zhirong Wu, Han Hu, Stephen Lin.

* Equal contribution.

Introduction

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects:

object-level representations are introduced via selective search bounding boxes as object proposals;
the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN);
the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework.

Architecture

Main results

The pretrained models will be available soon.

SoCo pre-trained models

Model	Arch	Epochs	Scripts
SoCo	ResNet50-C4	100	SoCo_C4_100ep
SoCo	ResNet50-C4	400	SoCo_C4_400ep
SoCo	ResNet50-FPN	100	SoCo_FPN_100ep
SoCo	ResNet50-FPN	400	SoCo_FPN_400ep
SoCo*	ResNet50-FPN	400	SoCo_FPN_Star_400ep

Results on COCO with MaskRCNN R50-FPN

Methods	Epoch	AP^bb	AP^bb₅₀	AP^bb₇₅	AP^mk	AP^mk₅₀	AP^mk₇₅	Detectron2 trained
Scratch	-	31.0	49.5	33.2	28.5	46.8	30.4	--
Supervised	90	38.9	59.6	42.7	35.4	56.5	38.1	--
SoCo	100	42.3	62.5	46.5	37.6	59.1	40.5
SoCo	400	43.0	63.3	47.1	38.2	60.2	41.0
SoCo*	400	43.2	63.5	47.4	38.4	60.2	41.4

Results on COCO with MaskRCNN R50-C4

Methods	Epoch	AP^bb	AP^bb₅₀	AP^bb₇₅	AP^mk	AP^mk₅₀	AP^mk₇₅	Detectron2 trained
Scratch	-	26.4	44.0	27.8	29.3	46.9	30.8	--
Supervised	90	38.2	58.2	41.2	33.3	54.7	35.2	--
SoCo	100	40.4	60.4	43.7	34.9	56.8	37.0
SoCo	400	40.9	60.9	44.3	35.3	57.5	37.3

Get started

Requirements

The Dockerfile is included, please refer to it.

Prepare data with Selective Search

Generate Selective Search proposals

python selective_search/generate_imagenet_ss_proposals.py

Filter out not valid proposals with filter strategy

python selective_search/filter_ss_proposals_json.py

Post preprocessing for no proposals images

python selective_search/filter_ss_proposals_json_post_no_prop.py

Pretrain with SoCo

Use SoCo FPN 100 epoch as example.

bash ./tools/SoCo_FPN_100ep.sh

Finetune detector

Copy the folder detectron2_configs to the root folder of Detectron2
Train the detectors with Detectron2

Citation

@article{wei2021aligning,
  title={Aligning Pretraining for Detection via Object-Level Contrastive Learning},
  author={Wei, Fangyun and Gao, Yue and Wu, Zhirong and Hu, Han and Lin, Stephen},
  journal={arXiv preprint arXiv:2106.02637},
  year={2021}
}

[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

Related tags

Overview

SoCo

Introduction

Architecture

Main results

SoCo pre-trained models

Results on COCO with MaskRCNN R50-FPN

Results on COCO with MaskRCNN R50-C4

Get started

Requirements

Prepare data with Selective Search

Pretrain with SoCo

Finetune detector

Citation

Owner

Yue Gao

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Have you ever wondered how cool it would be to have your own A.I

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021)

VGG16 model-based classification project about brain tumor detection.

A library of multi-agent reinforcement learning components and systems

Deep Probabilistic Programming Course @ DIKU

Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

PyTorch implementation of "VRT: A Video Restoration Transformer"

Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Few-shot NLP benchmark for unified, rigorous eval

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Yolov5+SlowFast: Realtime Action Detection Based on PytorchVideo

Python based Advanced AI Assistant

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

A Deep Learning based project for creating line art portraits.