Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Overview

Robust Object Detection via Instance-Level Temporal Cycle Confusion

This repo contains the implementation of the ICCV 2021 paper, Robust Object Detection via Instance-Level Temporal Cycle Confusion.

Screenshot

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motion, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on various domain adaptation benchmarks with static images (Cityscapes, Foggy Cityscapes, Sim10K) and large-scale video datasets (BDD100K and Waymo open data).

Installation

Environment

  • CUDA 10.2
  • Python >= 3.7
  • Pytorch >= 1.6
  • THe Detectron2 version matches Pytorch and CUDA versions.

Dependencies

  1. Create a virtual env.
  • python3 -m pip install --user virtualenv
  • python3 -m venv cyc-conf
  • source cyc-conf/bin/activate
  1. Install dependencies.
  • pip install -r requirements.txt

  • Install Pytorch 1.9

pip3 install torch torchvision

Check out the previous Pytorch versions here.

  • Install Detectron2 Build Detectron2 from Source (gcc & g++ >= 5.4) python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Or, you can install Pre-built detectron2 (example for CUDA 10.2, Pytorch 1.9)

python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html

More details can be found here.

Data Preparation

BDD100K

  1. Download the BDD100K MOT 2020 dataset (MOT 2020 Images and MOT 2020 Labels) and the detection labels (Detection 2020 Labels) here and the detailed description is available here. Put the BDD100K data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   │   ├── images
│   │   │    └── track
│   │   │        ├── train
│   │   │        ├── val
│   │   │        └── test
│   │   └── labels
│   │        ├── box_track_20
│   │        │   ├── train
│   │        │   └── val
│   │        └── det_20
│   │            ├── det_train.json
│   │            └── det_val.json
│   ├── waymo

Convert the labels of the MOT 2020 data (train & val sets) into COCO format by running:

python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/val/ -o datasets/bdd100k/labels/track/bdd100k_mot_val_coco.json -m track
python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/train/ -o datasets/bdd100k/labels/track/bdd100k_mot_train_coco.json -m track
  1. Split the original videos into different domains (time of day). Run the following command:
python3 -m datasets.domain_splits_bdd100k

This script will first extract the domain attributes from the BDD100K detection set and then map them to the tracking set sequences. After the processing steps, you would see two additional folders domain_splits and per_seq under the datasets/bdd100k/labels/box_track_20. The domain splits of all attributes in BDD100K detection set can be found at datasets/bdd100k/labels/domain_splits.

Waymo

  1. Download the Waymo dataset here. Put the Waymo raw data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   ├── waymo
│   │   └── raw

Convert the raw TFRecord data files into COCO format by running:

python3 -m datasets.waymo2coco

Note that this script takes a long time to run, be prepared to keep it running for over a day.

  1. Convert the BDD100K dataset labels into 3 classes (originally 8). This needs to be done in order to match the 3 classes of the Waymo dataset. Run the following command:
python3 -m datasets.convert_bdd_3cls

Get Started

For joint training,

python3 -m tools.train_net --config-file [config_file] --num-gpus 8

For evaluation,

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only

This command will load the latest checkpoint in the folder. If you want to specify a different checkpoint or evaluate the pretrained checkpoints, you can run

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only MODEL.WEIGHTS [PATH_TO_CHECKPOINT]

Benchmark Results

Dataset Statistics

Dataset Split Seq frames/seq. boxes classes
BDD100K Daytime train 757 204 1.82M 8
val 108 204 287K 8
BDD100K Night train 564 204 895K 8
val 71 204 137K 8
Waymo Open Data train 798 199 3.64M 3
val 202 199 886K 3

Out of Domain Evaluation

BDD100K Daytime to Night. The base detector is Faster R-CNN with ResNet-50.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 17.84 31.35 17.68 4.92 16.15 35.56 link link
+ Rotation 18.58 32.95 18.15 5.16 16.93 36.00 link link
+ Jigsaw 17.47 31.22 16.81 5.08 15.80 33.84 link link
+ Cycle Consistency 18.35 32.44 18.07 5.04 17.07 34.85 link link
+ Cycle Confusion 19.09 33.58 19.14 5.70 17.68 35.86 link link

BDD100K Night to Daytime.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 19.14 33.04 19.16 5.38 21.42 40.34 link link
+ Rotation 19.07 33.25 18.83 5.53 21.32 40.06 link link
+ Jigsaw 19.22 33.87 18.71 5.67 22.35 38.57 link link
+ Cycle Consistency 18.89 33.50 18.31 5.82 21.01 39.13 link link
+ Cycle Confusion 19.57 34.34 19.26 6.06 22.55 38.95 link link

Waymo Front Left to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 10.07 19.62 9.05 2.67 10.81 18.62 link link
+ Rotation 11.34 23.12 9.65 3.53 11.73 21.60 link link
+ Jigsaw 9.86 19.93 8.40 2.77 10.53 18.82 link link
+ Cycle Consistency 11.55 23.44 10.00 2.96 12.19 21.99 link link
+ Cycle Confusion 12.27 26.01 10.24 3.44 12.22 23.56 link link

Waymo Front Right to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 8.65 17.26 7.49 1.76 8.29 19.99 link link
+ Rotation 9.25 18.48 8.08 1.85 8.71 21.08 link link
+ Jigsaw 8.34 16.58 7.26 1.61 8.01 18.09 link link
+ Cycle Consistency 9.11 17.92 7.98 1.78 9.36 19.18 link link
+ Cycle Confusion 9.99 20.58 8.30 2.18 10.25 20.54 link link

Citation

If you find this repository useful for your publications, please consider citing our paper.

@article{wang2021robust,
  title={Robust Object Detection via Instance-Level Temporal Cycle Confusion},
  author={Wang, Xin and Huang, Thomas E and Liu, Benlin and Yu, Fisher and Wang, Xiaolong and Gonzalez, Joseph E and Darrell, Trevor},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Xin Wang
Researcher from Microsoft Research. Prev. Ph.D. student at UC Berkeley.
Xin Wang
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
Pytorch ImageNet1k Loader with Bounding Boxes.

ImageNet 1K Bounding Boxes For some experiments, you might wanna pass only the background of imagenet images vs passing only the foreground. Here, I'v

Amin Ghiasi 11 Oct 15, 2022
Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

6 Sep 03, 2021
This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Polygon-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes. Section I. Description The codes a

xinzelee 226 Jan 05, 2023
Pytorch Implementation of Various Point Transformers

Pytorch Implementation of Various Point Transformers Recently, various methods applied transformers to point clouds: PCT: Point Cloud Transformer (Men

Neil You 434 Dec 30, 2022
Facebook AI Image Similarity Challenge: Descriptor Track

Facebook AI Image Similarity Challenge: Descriptor Track This repository contains the code for our solution to the Facebook AI Image Similarity Challe

Sergio MP 17 Dec 14, 2022
Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Containerized Streamlit web app This repository is featured in a 3-part series on Deploying web apps with Streamlit, Docker, and AWS. Checkout the blo

Collin Prather 62 Jan 02, 2023
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 71 Dec 28, 2022
Deep motion generator collections

GenMotion GenMotion (/gen’motion/) is a Python library for making skeletal animations. It enables easy dataset loading and experiment sharing for synt

23 May 24, 2022
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

MLCR This is the source code for paper Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition. Xuesong Niu, Hu Han, Shiguang

Edson-Niu 60 Nov 29, 2022
LAMDA: Label Matching Deep Domain Adaptation

LAMDA: Label Matching Deep Domain Adaptation This is the implementation of the paper LAMDA: Label Matching Deep Domain Adaptation which has been accep

Tuan Nguyen 9 Sep 06, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Jiaxi Jiang 282 Jan 02, 2023
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 02, 2023
Music Generation using Neural Networks Streamlit App

Music_Gen_Streamlit "Music Generation using Neural Networks" Streamlit App TO DO: Make a run_app.sh Introduction [~5 min] (Sohaib) Team Member names/i

Muhammad Sohaib Arshid 6 Aug 09, 2022
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

Guy Rosin 12 Jan 06, 2023
Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv MMCoref_cleaned Code for the MMCoref task of the SIMMC 2.0 dataset. Pre

Yichen (William) Huang 2 Dec 05, 2022
A small library for doing fluid simulation with neural networks.

Neural Fluid Fields This is a small library for doing fluid simulation with neural fields. Check out our review paper, Neural Fields in Visual Computi

Towaki 23 Jun 23, 2022
PyTorch implementation of "Simple and Deep Graph Convolutional Networks"

Simple and Deep Graph Convolutional Networks This repository contains a PyTorch implementation of "Simple and Deep Graph Convolutional Networks".(http

chenm 253 Dec 08, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Meta-learning for NLP

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks Code for training the meta-learning models and fine-tuning on downstr

IESL 43 Nov 08, 2022