Referring Video Object Segmentation

Last update: Dec 11, 2022

Overview

Awesome-Referring-Video-Object-Segmentation

Welcome to starts ⭐ & comments 💹 & sharing 😀 !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. 😎

Introduction

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, “URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark”, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

ReferFormer:https://arxiv.org/pdf/2201.00487.pdf
MTTR:https://github.com/mttr2021/MTTR
PMINet:https://youtube-vos.org/assets/challenge/2021/reports/RVOS_2_Ding.pdf
CMPC-V [PAMI 2021]:https://github.com/spyflying/CMPC-Refseg

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175

URVOS [ECCV 2020]:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

Refer-YouTube-VOS-datasets

YouTube-VOS:

wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
└── refer_youtube_vos/ 
    ├── train/
    │   ├── JPEGImages/
    │   │   └── */ (video folders)
    │   │       └── *.jpg (frame image files) 
    │   └── Annotations/
    │       └── */ (video folders)
    │           └── *.png (mask annotation files) 
    ├── valid/
    │   └── JPEGImages/
    │       └── */ (video folders)
    │           └── *.jpg (frame image files) 
    └── meta_expressions/
        ├── train/
        │   └── meta_expressions.json  (text annotations)
        └── valid/
            └── meta_expressions.json  (text annotations)

A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
└── a2d_sentences/ 
    ├── Release/
    │   ├── videoset.csv  (videos metadata file)
    │   └── CLIPS320/
    │       └── *.mp4     (video files)
    └── text_annotations/
        ├── a2d_annotation.txt  (actual text annotations)
        ├── a2d_missed_videos.txt
        └── a2d_annotation_with_instances/ 
            └── */ (video folders)
                └── *.h5 (annotations files)

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

J-HMDB:http://jhmdb.is.tue.mpg.de/

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
└── jhmdb_sentences/ 
    ├── Rename_Images/  (frame images)
    │   └── */ (action dirs)
    ├── puppet_mask/  (mask annotations)
    │   └── */ (action dirs)
    └── jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

refer-DAVIS16/17:[https://arxiv.org/pdf/1803.08006.pdf]

Referring Video Object Segmentation

Related tags

Overview

Awesome-Referring-Video-Object-Segmentation

Introduction

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Benchmark

Datasets

Owner

Explorer

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

TensorFlow CNN for fast style transfer

A PyTorch version of You Only Look at One-level Feature object detector

Predict the latency time of the deep learning models

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

SynNet - synthetic tree generation using neural networks

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

A privacy-focused, intelligent security camera system.

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

Neural Dynamic Policies for End-to-End Sensorimotor Learning

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

OoD Minimum Anomaly Score GAN - Code for the Paper 'OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary'

Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Fake videos detection by tracing the source using video hashing retrieval.

Sequential Model-based Algorithm Configuration

This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)