Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Overview

Propose-Reduce VIS

This repo contains the official implementation for the paper:

Video Instance Segmentation with a Propose-Reduce Paradigm

Huaijia Lin*, Ruizheng Wu*, Shu Liu, Jiangbo Lu, Jiaya Jia

ICCV 2021 | Paper

TeaserImage

Installation

Please refer to INSTALL.md.

Demo

You can compute the VIS results for your own videos.

  1. Download pretrained weight.
  2. Put example videos in 'demo/inputs'. We support two types of inputs, frames directories or .mp4 files (see example for details).
  3. Run the following script and obtain the results in demo/outputs.
sh demo.sh

Data Preparation

(1) Download the videos and jsons of val set from YouTube-VIS 2019

(2) Download the videos and jsons of val set from YouTube-VIS 2021

(3) Symlink the corresponding dataset and json files to the data folder

mkdir data
data
├── valset_ytv19 --> /path/to/ytv2019/vos/valid/JPEGImages/ 
├── valid_ytv19.json --> /path/to/ytv2019/vis/valid.json
├── valset_ytv21 --> /path/to/ytv2021/vis/valid/JPEGImages/ 
├── valid_ytv21.json --> /path/to/ytv2021/vis/valid/instances.json

Results

We provide the results of several pretrained models and corresponding scripts on different backbones. The results have slight differences from the paper because we make minor modifications to the inference codes.

Download the pretrained models and put them in pretrained folder.

mkdir pretrained
Dataset Method Backbone CA Reduce AP [email protected] download
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 40.8 49.9 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 42.5 56.8 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 43.8 52.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 45.2 59.0 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 47.6 56.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 48.8 62.2 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 39.6 47.5 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 41.7 54.9 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 45.6 52.9 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 47.2 57.6 scripts

Evaluation

YouTube-VIS 2019: A json file will be saved in `../Results_ytv19' folder. Please zip and upload to the codalab server.

YouTube-VIS 2021: A json file will be saved in `../Results_ytv21' folder. Please zip and upload to the codalab server.

TODOs

Citation

If you find this work useful in your research, please cite:

@article{lin2021video,
  title={Video Instance Segmentation with a Propose-Reduce Paradigm},
  author={Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions regarding the repo, please feel free to contact me ([email protected]) or create an issue.

Acknowledgments

This repo is based on MMDetection, MaskTrackRCNN, STM, MMCV and COCOAPI.

Owner
DV Lab
Deep Vision Lab
DV Lab
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022
Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor.

Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor. It is devel

33 Nov 11, 2022
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Yuki M. Asano 249 Dec 22, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

RawVSR This repo contains the official codes for our paper: Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference Xiaoh

Xiaohong Liu 23 Oct 08, 2022
Python Actor concurrency library

Thespian Actor Library This library provides the framework of an Actor model for use by applications implementing Actors. Thespian Site with Documenta

Kevin Quick 177 Dec 11, 2022
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 09, 2022
A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Connor Anderson 20 Dec 03, 2022
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

Phil Wang 209 Dec 28, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
ARAE-Tensorflow for Discrete Sequences (Adversarially Regularized Autoencoder)

ARAE Tensorflow Code Code for the paper Adversarially Regularized Autoencoders for Generating Discrete Structures by Zhao, Kim, Zhang, Rush and LeCun

19 Nov 12, 2021
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c

203 Dec 30, 2022
Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

DSN-IQA Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment" Requirements Python =3.8.0 Pytorch =1.7.1 Usage wit

7 Oct 13, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 05, 2023
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Deformable Neural Radiance Fields This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies. Project Page Paper Video This codebase conta

Google 1k Jan 09, 2023
The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

Yabin Zhang 26 Dec 26, 2022
最新版本yolov5+deepsort目标检测和追踪,支持5.0版本可训练自己数据集

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

422 Dec 30, 2022