Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Last update: Jan 06, 2023

Related tags

Overview

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Official implementation of:

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation
Jialian Wu, Liangchen Song, Tiancai Wang, Qian Zhang and Junsong Yuan
In ACM International Conference on Multimedia , Seattle WA, October 12-16, 2020.

Many thanks to mmdetection authors for their great framework!

News

Mar 2, 2021 Update: We test Forest R-CNN on LVIS v1.0 set. Thanks for considering comparing with our method :)

Jan 1, 2021 Update: We propose Forest DetSeg, an extension of original Forest R-CNN. Forest DetSeg extends the proposed method to RetinaNet. While the new work is under review now, the code has been available. More details will come up along with the new paper.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Forest R-CNN

Inference

# Examples
# single-gpu testing
python tools/test.py configs/lvis/forest_rcnn_r50_fpn.py forest_rcnn_res50.pth --out out.pkl --eval bbox segm

# multi-gpu testing
./tools/dist_test.sh configs/lvis/forest_rcnn_r50_fpn.py forest_rcnn_res50.pth ${GPU_NUM} --out out.pkl --eval bbox segm

Training

# Examples
# single-gpu training
python tools/train.py configs/lvis/forest_rcnn_r50_fpn.py --validate

# multi-gpu training
./tools/dist_train.sh configs/lvis/forest_rcnn_r50_fpn.py ${GPU_NUM} --validate

(Note that we found in our experiments the best result comes up around the 20-th epoch instead of the end of training.)

Forest RetinaNet

Inference

# Examples  
# multi-gpu testing
./tools/dist_test.sh configs/lvis/forest_retinanet_r50_fpn_1x.py forest_retinanet_res50.pth ${GPU_NUM} --out out.pkl --eval bbox segm

Training

# Examples    
# multi-gpu training
./tools/dist_train.sh configs/lvis/forest_retinanet_r50_fpn_1x.py ${GPU_NUM} --validate

Main Results

Instance Segmentation on LVIS v0.5 val set

AP and AP.b denote the mask AP and box AP. r, c, f represent the rare, common, frequent contegoires.

Method	Backbone	AP	AP.r	AP.c	AP.f	AP.b	AP.b.r	AP.b.c	AP.b.f	download
MaskRCNN	R50-FPN	21.7	6.8	22.6	26.4	21.8	6.5	21.6	28.0	model
Forest R-CNN	R50-FPN	25.6	18.3	26.4	27.6	25.9	16.9	26.1	29.2	model
MaskRCNN	R101-FPN	23.6	10.0	24.8	27.6	23.5	8.7	23.1	29.8	model
Forest R-CNN	R101-FPN	26.9	20.1	27.9	28.3	27.5	20.0	27.5	30.4	model
MaskRCNN	X-101-32x4d-FPN	24.8	10.0	26.4	28.6	24.8	8.6	25.0	30.9	model
Forest R-CNN	X-101-32x4d-FPN	28.5	21.6	29.7	29.7	28.8	20.6	29.2	31.7	model

Instance Segmentation on LVIS v1.0 val set

Method	Backbone	AP	AP.r	AP.c	AP.f	AP.b
MaskRCNN	R50-FPN	19.2	0.0	17.2	29.5	20.0
Forest R-CNN	R50-FPN	23.2	14.2	22.7	27.7	24.6

Visualized Examples

Citation

If you find it useful in your research, please consider citing our paper as follows:

@inproceedings{wu2020forest,
title={Forest R-CNN: Large-vocabulary long-tailed object detection and instance segmentation},
author={Wu, Jialian and Song, Liangchen and Wang, Tiancai and Zhang, Qian and Yuan, Junsong},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={1570--1578},
year={2020}}

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Related tags

Overview

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

News

Installation

Forest R-CNN

Inference

Training

Forest RetinaNet

Inference

Training

Main Results

Instance Segmentation on LVIS v0.5 val set

Instance Segmentation on LVIS v1.0 val set

Visualized Examples

Citation

Owner

Jialian Wu

A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

This tutorial repository is to introduce the functionality of KGTK to first-time users

Self-Supervised Methods for Noise-Removal

Continual World is a benchmark for continual reinforcement learning

Rafael Project- Classifying rockets to different types using data science algorithms.

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Hso-groupie - A pwnable challenge in Real World CTF 4th

This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

PyTorch implementation for NED. It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles.

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Bridging Vision and Language Model

Using deep actor-critic model to learn best strategies in pair trading

The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Application of K-means algorithm on a music dataset after a dimensionality reduction with PCA

(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset