[CVPR 2021] MiVOS - Scribble to Mask module

Last update: Dec 22, 2022

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!

Overall structure and capabilities

	MiVOS	Mask-Propagation	Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation	❌	✔️	❌
DAVIS interactive evaluation	✔️	❌	❌
User interaction GUI tool	✔️	❌	❌
Dense Correspondences	❌	✔️	❌
Train propagation module	❌	✔️	❌
Train S2M (interaction) module	❌	❌	✔️
Train fusion module	✔️	❌	❌
Generate more synthetic data	✔️	❌	❌

Requirements

The package versions shown here are the ones that I used. You might not need the exact versions.

PyTorch 1.6.0
torchvision 0.7.0
opencv-contrib 4.2.0
davis-interactive (https://github.com/albertomontesg/davis-interactive)
gitpython for training
gdown for downloading pretrained models

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install opencv-contrib-python gitpython gdown

Pretrained model

Download and put the model in ./saves/. Alternatively use the provided download_model.py.

[OneDrive Mirror]

Interactive GUI

python interactive.py --image <image>

Controls:

Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)

Known issues

The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.

Training

Datasets

Download and extract LVIS training set.
Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the download_datasets.py in Mask-Propagation.

├── lvis
│   ├── lvis_v1_train.json
│   └── train2017
├── Scribble-to-Mask
└── static
    ├── BIG_small
    └── ...

Commands

Use the deeplabv3plus_resnet50 pretrained model provided here.

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>

Credit

Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments

AttributeError: Caught AttributeError in DataLoader worker process 0

Hello! I followed the instructions of the training command, it has thrown an error about AttributeError. I put the static folder outside this repository as you mentioned. It is confusing that I can use the same datasets for the pretraining propagation module, the train.py in Mask-Propagation works fine.

opened by xwhkkk 2
git.exc.InvalidGitRepositoryError when running train.py

Hello! I followed the instruction of the training command, but it has thrown an error about GitRepositoryError. I used command : CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 1842 --nproc_per_node=2 train.py --id s2m --load_deeplab ./deeplab_resnet50/best_deeplabv3plus_resnet50_voc_os16.pth, and I have 2 GPUs. Could you give me some suggestions?

opened by xwhkkk 2
About evaluation of the model

Hi,

thank you for the nice work.

I have a concern about the evaluation of the model. Because there is no validation set to pick the best model. It may has a potential overfitting problem. (Or what should the validation set for interactive segmentation look like? If there is a unified standard, it will be more helpful for everyone to compare their methods.)

In interactive object segmentation setting, is this setting popular? I am new here for the interactive segmentation. Wish to solve my concern, thank you.

opened by Limingxing00 2
Question about Local Control Strategy

A simple but practical segmentation tool! I've read your paper, and it says that local control strategy is used in S2M. However, I don't find the local control step in this code. Why don't you provide it in this tool? Will local control make significant difference to the performance?

opened by distillation-dcf 1
DeepLabv3 pre-trained models

Hello,

I wanted to mention that in order to train S2M from scratch, using the deeplabv3_resnet50 pre-trained model provided in this repo, returns the following error: KeyError: 'classifier.classifier.0.convs.0.0.weight. Meaning that the weights from this layer are not present in deeplabv3_resnet50. But using the deeplabv3plus_resnet50 from the same repo executes without errors.

Best!

opened by UndecidedBoy 1
saving error

Hello! Thanks for sharing your code. When I run python interactive.py and want to save the masks, appeared following error.

Could you give me some suggestions?

opened by xwhkkk 3
Fix simple issues and allow for cpu only use

I had to make some changes to be able to use the code on cpu only system and had troubles saving the mask from the interactive GUI and fixed it. Thanks for the great work.

opened by rami-alloush 3

Releases(1.0)

1.0(Mar 14, 2021)

Pretrained model
Source code(tar.gz)
Source code(zip)
s2m.pth(152.04 MB)

Owner

Rex Cheng

@ HKUST.

GitHub Repository https://hkchengrex.github.io/MiVOS/

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

3 Jan 07, 2022

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

2.8k Dec 27, 2022

Multiview 3D object detection on MultiviewC dataset through moft3d.

Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop

20 Dec 21, 2022

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant - PyTorch Implementation PyTorch Implementation of FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis. Qu

63 Jan 02, 2023

Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Osborne Mine, Australia - Airborne total-field magnetic anomaly This is a section of a survey acquired in 1990 by the Queensland Government, Australia

1 Jan 21, 2022

fklearn: Functional Machine Learning

fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th

1.4k Dec 07, 2022

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

134 Jan 01, 2023

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

DiffWave DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via itera

498 Jan 03, 2023

Playing around with FastAPI and streamlit to create a YoloV5 object detector

FastAPI-Streamlit-based-YoloV5-detector Playing around with FastAPI and streamlit to create a YoloV5 object detector It turns out that a User Interfac

2 Jan 20, 2022

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology Self-Supervised Vision Transformers Learn Visual Concepts in Histopatholog

95 Dec 24, 2022

MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

64 Dec 29, 2022

[TPDS'21] COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments

COSCO Framework COSCO is an AI based coupled-simulation and container orchestration framework for integrated Edge, Fog and Cloud Computing Environment

39 Dec 25, 2022

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

261 Jan 09, 2023

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

77 Dec 22, 2022

[CVPR 2021] MiVOS - Scribble to Mask module

Related tags

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Overall structure and capabilities

Requirements

Pretrained model

Interactive GUI

Known issues

Training

Datasets

Commands

Credit

Citation

Comments

AttributeError: Caught AttributeError in DataLoader worker process 0

git.exc.InvalidGitRepositoryError when running train.py

About evaluation of the model

Question about Local Control Strategy

DeepLabv3 pre-trained models

saving error

Fix simple issues and allow for cpu only use