Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Last update: Nov 25, 2022

Related tags

Overview

CMaskTrack R-CNN for OVIS

This repo serves as the official code release of the CMaskTrack R-CNN model on the Occluded Video Instance Segmentation dataset described in the tech report:

Occluded Video Instance Segmentation

Jiyang Qi^1,2*, Yan Gao²*, Yao Hu², Xinggang Wang¹, Xiaoyu Liu²,
Xiang Bai¹, Serge Belongie³, Alan Yuille⁴, Philip Torr⁵, Song Bai^2,5

📧
¹Huazhong University of Science and Technology ²Alibaba Group ³University of Copenhagen
⁴Johns Hopkins University ⁵University of Oxford

In this work, we collect a large-scale dataset called OVIS for Occluded Video Instance Segmentation. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion.

Some annotation examples can be seen below:

For more details about the dataset, please refer to our paper or website.

Model training and evaluation

Installation

This repo is built based on MaskTrackRCNN. A customized COCO API for the OVIS dataset is also provided.

You can use following commands to create conda env with all dependencies.

conda create -n cmtrcnn python=3.6 -y
conda activate cmtrcnn

conda install -c pytorch pytorch=1.3.1 torchvision=0.2.2 cudatoolkit=10.1 -y
pip install -r requirements.txt
pip install git+https://github.com/qjy981010/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

bash compile.sh

Data preparation

Download OVIS from our website.
Symlink the train/validation dataset to data/OVIS/ folder. Put COCO-style annotations under data/annotations.

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── OVIS
│   │   ├── train_images
│   │   ├── valid_images
│   │   ├── annotations
│   │   │   ├── annotations_train.json
│   │   │   ├── annotations_valid.json

Training

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on OVIS based on a MSCOCO pretrained checkpoint (mmlab link or google drive).

Run the command below to train the model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py --work_dir ./workdir/cmasktrack_rcnn_r50_fpn_1x_ovis --gpus 4

For reference to arguments such as learning rate and model parameters, please refer to configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py.

Evaluation

Our pretrained model is available for download at Google Drive (comming soon). Run the following command to evaluate the model on OVIS.

CUDA_VISIBLE_DEVICES=0 python test_video.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py [MODEL_PATH] --out [OUTPUT_PATH.pkl] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.pkl.json. OVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license, while the correlation ops is under MIT license.

Acknowledgement

This project is based on mmdetection (commit hash f3a939f), mmcv, MaskTrackRCNN and Pytorch-Correlation-extension. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{qi2021occluded,
    title={Occluded Video Instance Segmentation},
    author={Jiyang Qi and Yan Gao and Yao Hu and Xinggang Wang and Xiaoyu Liu and Xiang Bai and Serge Belongie and Alan Yuille and Philip Torr and Song Bai},
    journal={arXiv preprint arXiv:2102.01558},
    year={2021},
}

Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Related tags

Overview

CMaskTrack R-CNN for OVIS

Occluded Video Instance Segmentation

Model training and evaluation

Installation

Data preparation

Training

Evaluation

License

Acknowledgement

Citation

Owner

Q . J . Y

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

A pytorch reprelication of the model-based reinforcement learning algorithm MBPO

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Fine-tuning StyleGAN2 for Cartoon Face Generation

Code for "On Memorization in Probabilistic Deep Generative Models"

A full-fledged version of Pix2Seq

PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

CryptoFrog - My First Strategy for freqtrade

Train Yolov4 using NBX-Jobs

Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

The Python3 import playground

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Self-Learning - Books Papers, Courses & more I have to learn soon

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Siamese TabNet

Trainable PyTorch reproduction of AlphaFold 2