[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Overview

MiVOS (CVPR 2021) - Mask Propagation

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

[arXiv] [Paper PDF] [Project Page] [Papers with Code]

Parkour Bike

This repo implements an improved version of the Space-Time Memory Network (STM) and is part of the accompanying code of Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (MiVOS). It can be used as:

  1. A tool for propagating masks across video frames. Results
  2. An integral component for reproducing and/or improving the performance in MiVOS.
  3. A tool that can compute dense correspondences between two frames. Tutorial

Overall structure and capabilities

MiVOS Mask-Propagation Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation ✔️
DAVIS interactive evaluation ✔️
User interaction GUI tool ✔️
Dense Correspondences ✔️
Train propagation module ✔️
Train S2M (interaction) module ✔️
Train fusion module ✔️
Generate more synthetic data ✔️

Framework

framework

Requirements

We used these packages/versions in the development of this project. It is likely that higher versions of the same package will also work. This is not an exhaustive list -- other common python packages (e.g. pillow) are expected and not listed.

  • PyTorch 1.7.1
  • torchvision 0.8.2
  • OpenCV 4.2.0
  • progressbar
  • thinspline for training (pip install git+https://github.com/cheind/py-thin-plate-spline)
  • gitpython for training
  • gdown for downloading pretrained models

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest (except thin spline) can be installed by:

pip install progressbar2 opencv-python gitpython gdown

Main Results

Semi-supervised VOS

FPS is amortized, computed as total processing time / total number of frames irrespective of the number of objects, aka multi-object FPS. All times are measured on an RTX 2080 Ti with IO time excluded. Pre-computed results and evaluation outputs (either from local evaluation or CodaLab output log) are also provided. All evaluations are done in 480p resolution.

(Note: This implementation is not optimal in speed. There are ways to speed it up but we wanted to keep it in its simplest PyTorch form.)

Find all the precomputed results here.

DAVIS 2016 val:

Produced using eval_davis_2016.py

Model Top-k? J F J&F FPS Pre-computed results
Without BL pretraining 87.0 89.0 88.0 15.5 D16_s02_notop
Without BL pretraining ✔️ 89.7 92.1 90.9 16.9 D16_s02
With BL pretraining 87.8 90.0 88.9 15.5 D16_s012_notop
With BL pretraining ✔️ 89.7 92.4 91.0 16.9 D16_s012

DAVIS 2017 val:

Produced using eval_davis.py

Model Top-k? J F J&F FPS Pre-computed results
Without BL pretraining 78.8 84.2 81.5 9.75 D17_s02_notop
Without BL pretraining ✔️ 80.5 85.8 83.1 11.2 D17_s02
With BL pretraining 81.1 86.5 83.8 9.75 D17_s012_notop
With BL pretraining ✔️ 81.7 87.4 84.5 11.2 D17_s012

For YouTubeVOS val and DAVIS test-dev we also tried the kernelized memory (called KM in our code) technique described in Kernelized Memory Network for Video Object Segmentation. It works nicely with our top-k filtering.

YouTubeVOS val:

Produced using eval_youtube.py

Model Kernel Memory (KM)? J-Seen J-Unseen F-Seen F-Unseen Overall Score Pre-computed results
Full model with top-k 80.6 77.3 84.7 85.5 82.0 D17_testdev_s012
Full model with top-k ✔️ 81.6 77.7 85.8 85.9 82.8 D17_testdev_s012_km

DAVIS 2017 test-dev:

Produced using eval_davis.py

Model Kernel Memory (KM)? J F J&F Pre-computed results
Full model with top-k 72.7 80.2 76.5 YV_val_s012
Full model with top-k ✔️ 74.9 82.2 78.6 YV_val_s012_km

Running them yourselves

You can look at the corresponding scripts (eval_davis.py, eval_youtube.py, etc.). The arguments tooltip should give you a rough idea of how to use them. For example, if you have downloaded the datasets and pretrained models using our scripts, you only need to specify the output path: python eval_davis.py --output [somewhere] for DAVIS 2017 validation set evaluation.

Correspondences

The W matrix can be considered as a dense correspondence (affinity) matrix. This is in fact how we used it in the fusion module. See try_correspondence.py for details. We have included a small GUI there to show the correspondences (a point source is used, but a mask/tensor can be used in general).

Try it yourself: python try_correspondence.py.

Source Target
Source 1 Target 1
Source 2 Target 2
Source 3 Target 3

Pretrained models

Here we provide two pretrained models. One is pretrained on static images and transferred to main training (we call it s02: stage 0 -> stage 2); the other is pretrained on both static images and BL30K then transferred to main training (we call it s012). For the s02 model, we train it for 300K (instead of 150K) iterations in the main training stage to offset the extra training. More iterations do not help/help very little. The script download_model.py automatically downloads the s012 model. Put all pretrained models in Mask-Propagation/saves/.

Model Google Drive OneDrive
s02 link link
s012 link link

Training

Data preparation

I recommend either softlinking (ln -s) existing data or use the provided download_datasets.py to structure the datasets as our format. download_datasets.py might download more than what you need -- just comment out things that you don't like. The script does not download BL30K because it is huge (>600GB) and we don't want to crash your harddisks. See below.

├── BL30K
├── DAVIS
│   ├── 2016
│   │   ├── Annotations
│   │   └── ...
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── Mask-Propagation
├── static
│   ├── BIG_small
│   └── ...
└── YouTube
    ├── all_frames
    │   └── valid_all_frames
    ├── train
    ├── train_480p
    └── valid

BL30K

BL30K is a synthetic dataset rendered using ShapeNet data and Blender. For details, see MiVOS.

You can either use the automatic script download_bl30k.py or download it manually below. Note that each segment is about 115GB in size -- 700GB in total. You are going to need ~1TB of free disk space to run the script (including extraction buffer).

Google Drive is much faster in my experience. Your mileage might vary.

Manual download: [Google Drive] [OneDrive]

Training commands

CUDA_VISIBLE_DEVICES=[a,b] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=2 train.py --id [defg] --stage [h]

We implemented training with Distributed Data Parallel (DDP) with two 11GB GPUs. Replace a, b with the GPU ids, cccc with an unused port number, defg with a unique experiment identifier, and h with the training stage (0/1/2).

The model is trained progressively with different stages (0: static images; 1: BL30K; 2: YouTubeVOS+DAVIS). After each stage finishes, we start the next stage by loading the trained weight.

One concrete example is:

Pre-training on static images: CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s0 --stage 0

Pre-training on the BL30K dataset: CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s01 --load_network [path_to_trained_s0.pth] --stage 1

Main training: CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s012 --load_network [path_to_trained_s01.pth] --stage 2

Details

Files to look at

  • model/network.py - Defines the core network.
  • model/model.py - Training procedure.
  • util/hyper_para.py - Hyperparameters that you can provide by specifying command line arguments.

What are the differences?

While I did start building this from STM's official evaluation code, the official training code is not available and therefore a lot of details are missing. My own judgments are used in the engineering of this work.

  • We both use the ResNet-50 backbone up to layer3 but there are a few minor architecture differences elsewhere (e.g. decoder, mask generation in the last layer)
  • This repo does not use the COCO dataset and uses some other static image datasets instead.
  • This repo picks two, instead of three objects for each training sample.
  • Top-k filtering (proposed by us) is included here
  • Our raw performance (without BL30K or top-k) is slightly worse than the original STM model but I believe we train with fewer resources.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments
  • About BL30K

    About BL30K

    作者您好,我将BL30K的6个压缩包全部下载好,并全部解压之后,在进行第二个阶段的预训练时报错是找不到data/dangjisheng/BL30K/a/BL30K/Annotations/kea03423/00020.png',不知道为什么?我是把6个文件压缩包全部下载好而且全部解压在一个目录下的,为什么会报错缺少文件?期待您的回复。

    image

    image

    opened by longmalongma 31
  • The server remained unresponsive for a long time when I try to train your model.

    The server remained unresponsive for a long time when I try to train your model.

    When I ran this line of code on our server, the server did not respond for a long time. Do you know why?

    UDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=1 train.py --id retrain_s0 --stage 0 --batch_size 4

    opened by longmalongma 20
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hi, thanks for your great work! When I try to run CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -m torch.distrib uted.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s0 --stage 0 , I meet this problem, can you help me?

    File "/home/longma/anaconda2/envs/p3torchstm/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/longma/anaconda2/envs/p3torchstm/bin/python', '-u', 'train.py', '--local_rank=1', '--id', 'retrain_s0', '--stage', '0']' returned non-zero exit status 1.

    opened by longmalongma 19
  • how to save the feature map of manymemory frames?

    how to save the feature map of manymemory frames?

    There is a part of your code that I don't understand. Should the memory frame be stored separately, or should the key-value feture map and the content feature map of the memory frame be connected together to save?Which line represents the memory frame saved?

    opened by longmalongma 10
  • Pre-training on the BL30K dataset after pre-training on static images

    Pre-training on the BL30K dataset after pre-training on static images

    As I see that in the pre-training on static images stage, the "single_object" in PropagationNetwork is True, so the MaskRGBEncoderSO is used. When I try to load the pre-trained of the above stage for the pre-training on the BL30K dataset or Main training, the "single_object" now is False and the model use MaskRGBEncoder instead. After that, the model can not load the model successfully. Here is the error: Traceback (most recent call last): File "train.py", line 68, in <module> total_iter = model.load_model(para['load_model']) File "/content/Mask-Propagation/model/model.py", line 180, in load_model self.PNet.module.load_state_dict(network) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PropagationNetwork: size mismatch for mask_rgb_encoder.conv1.weight: copying a param with shape torch.Size([64, 4, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 5, 7, 7]).

    So can you explain how can we fix it? Thank you so much.

    opened by nero1342 9
  • RuntimeError: Error(s) in loading state_dict for PropagationNetwork

    RuntimeError: Error(s) in loading state_dict for PropagationNetwork

    Hello ! I want to train the PropagationNetwork on my personal image dataset, so I use the training command CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s01 --load_network ./saves/propagation_model.pth --stage 0.(based on the pretrain model S012). It threw a runtime error.

    loadnetwork_error The training command works fine without the --load network parameters. Could you give me some suggestions?

    opened by xwhkkk 5
  • metrics results of test dataset

    metrics results of test dataset

    After I run the code eval_davis_2016.py, I only get the mask file in the output file. how could I get the value of metrics such as J, J&F? and how could we test the model on personal datasets to get those metrics after using interactive_gui.py? Thanks for your suggestions

    opened by xwhkkk 4
  • J&F performance on BL30K

    J&F performance on BL30K

    Hi, I am doing BL30K training for DAVIS 2017 val (including stage 0 and stage 1). I just want to know what J&F should I achieve on the DAVIS 2017 val after finishing BL30K training? Therefore, I can check whether my training is correct. I think it did not included in readme.

    opened by vateye 4
  • How to run two copies of your code at the same time?

    How to run two copies of your code at the same time?

    image

    I have duplicated two copies of your code and made small changes in the duplicated code respectively. When one is being trained, the other one cannot be trained. If the two codes are trained at the same time, what parameters need to be changed?One of my computers has 4 2080ti, the memory is enough.

    opened by longmalongma 4
  • Why don't you use top_k and km during the training phase?

    Why don't you use top_k and km during the training phase?

    Looking at your code I was a little confused why you didn't use top_k and km during the training phase. But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?

    opened by longmalongma 4
  • RuntimeError: CUDA error: out of memory

    RuntimeError: CUDA error: out of memory

    How many GPUs do you need to test on Davis and YouTube?I keep reporting memory errors during my tests.I directly used the model trained by static pictures for VOS training, skipping the pre-training of BL30K. Is that OK?

    opened by longmalongma 4
Releases(1.0)
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
converts nominal survey data into a numerical value based on a dictionary lookup.

SWAP RATE Converts nominal survey data into a numerical values based on a dictionary lookup. It allows the user to switch nominal scale data from text

Jake Rhodes 1 Jan 18, 2022
Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.

pixel_character_generator Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included. Dataset TinyHero D

Agnieszka Mikołajczyk 88 Nov 17, 2022
一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。

captcha_server 一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。 使用方法 python = 3.8 以上环境 pip install -r requirements.txt -i https://pypi.douban.com/simple gun

Sml2h3 189 Dec 02, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

268 Jan 01, 2023
Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS 2021), and the code to generate simulation results.

Scalable Intervention Target Estimation in Linear Models Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS

0 Oct 25, 2021
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022
Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Back to Event Basics: SSL of Image Reconstruction for Event Cameras Minimal code for Back to Event Basics: Self-Supervised Learning of Image Reconstru

TU Delft 42 Dec 26, 2022
The implementation of the lifelong infinite mixture model

Lifelong infinite mixture model 📋 This is the implementation of the Lifelong infinite mixture model 📋 Accepted by ICCV 2021 Title : Lifelong Infinit

Fei Ye 5 Oct 20, 2022
CT Based COVID 19 Diagnose by Image Processing and Deep Learning

This project proposed the deep learning and image processing method to undertake the diagnosis on 2D CT image and 3D CT volume.

1 Feb 08, 2022
Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

ielab 8 Nov 26, 2022
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022
计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

PJDong 599 Dec 23, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

Doanh B C 4 May 05, 2022
Uncertain natural language inference

Uncertain Natural Language Inference This repository hosts the code for the following paper: Tongfei Chen*, Zhengping Jiang*, Adam Poliak, Keisuke Sak

Tongfei Chen 14 Sep 01, 2022
Relative Positional Encoding for Transformers with Linear Complexity

Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin

Antoine Liutkus 48 Nov 16, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Dongkyu Lee 4 Sep 18, 2022