[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Related tags

Deep LearningBE
Overview

TBE

The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning" [arxiv] [code][Project Website]

image

Citation

@inproceedings{wang2021removing,
  title={Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Lin, Yiqi and Ma, Andy J and Cheng, Hao and Peng, Pai and Ji, Rongrong and Sun, Xing},
  booktitle={CVPR},
  year={2021}
}

News

[2020.3.7] The first version of TBE are released!

0. Motivation

  • In camera-fixed situation, the static background in most frames remain similar in pixel-distribution.

  • We ask the model to be temporal sensitive rather than static sensitive.

  • We ask model to filter the additive Background Noise, which means to erasing background in each frame of the video.

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

The impementaion of BE is very simple, you can implement it in two lines by python:

rand_index = random.randint(t)
mixed_x[j] = (1-prob) * x + prob * x[rand_index]

Then, just need define a loss function like MSE:

loss = MSE(F(mixed_x),F(x))

2. Installation

Dataset Prepare

Please refer to [dataset.md] for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)
  • Skvideo.io
  • Matplotlib (gradient_check)

As Kinetics dataset is time-consuming for IO, we decode the avi/mpeg on the fly. Please refer to data/video_dataset.py for details.

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51/Actor-HMDB51
      • hmdb51_sta: the train/val lists of HMDB51_STA
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials, include logs and trained models
    • gradientes:
    • visualization:
    • pretrained_model:
  • src
    • Contrastive
      • data: load data
      • loss: the loss evaluate in this paper
      • model: network architectures
      • scripts: train/eval scripts
      • augmentation: detail implementation of BE augmentation
      • utils
      • feature_extract.py: feature extractor given pretrained model
      • main.py: the main function of pretrain / finetune
      • trainer.py
      • option.py
      • pt.py: BE pretrain
      • ft.py: BE finetune
    • Pretext
      • main.py the main function of pretrain / finetune
      • loss: the loss include classification loss

4. Run

(1). Download dataset lists and pretrained model

A copy of both dataset lists is provided in anonymous. The Kinetics-pretrained models are provided in anonymous.

cd .. && mkdir datasets
mv [path_to_lists] to datasets
mkdir experiments && cd experiments
mkdir pretrained_models && logs
mv [path_to_pretrained_model] to ../experiments/pretrained_model

Download and extract frames of Actor-HMDB51.

wget -c  anonymous
unzip
python utils/data_process/gen_hmdb51_dir.py
python utils/data_process/gen_hmdb51_frames.py

(2). Network Architecture

The network is in the folder src/model/[].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

All the logits_channel are feed into a fc layer with 128-D output.

For simply, we divide the source into Contrastive and Pretext, "--method pt_and_ft" means pretrain and finetune in once.

Action Recognition

Random Initialization

For random initialization baseline. Just comment --weights in line 11 of ft.sh. Like below:

#!/usr/bin/env bash
python main.py \
--method ft --arch i3d \
--ft_train_list ../datasets/lists/diving48/diving48_v2_train_no_front.txt \
--ft_val_list ../datasets/lists/diving48/diving48_v2_test_no_front.txt \
--ft_root /data1/DataSet/Diving48/rgb_frames/ \
--ft_dataset diving48 --ft_mode rgb \
--ft_lr 0.001 --ft_lr_steps 10 20 25 30 35 40 --ft_epochs 45 --ft_batch_size 4 \
--ft_data_length 64 --ft_spatial_size 224 --ft_workers 4 --ft_stride 1 --ft_dropout 0.5 \
--ft_print-freq 100 --ft_fixed 0 # \
# --ft_weights ../experiments/kinetics_contrastive.pth

BE(Contrastive)

Kinetics
bash scripts/kinetics/pt_and_ft.sh
UCF101
bash scripts/ucf101/ucf101.sh
Diving48
bash scripts/Diving48/diving48.sh

For Triplet loss optimization and moco baseline, just modify --pt_method

BE (Triplet)

--pt_method be_triplet

BE(Pretext)

bash scripts/hmdb51/i3d_pt_and_ft_flip_cls.sh

or

bash scripts/hmdb51/c3d_pt_and_ft_flip.sh

Notice: More Training Options and ablation study can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

Kinetics Pretrained (I3D)

Method UCF101 HMDB51 Diving48
Random Initialization 57.9 29.6 17.4
MoCo Baseline 70.4 36.3 47.9
BE 86.5 56.2 62.6

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
BE 10.2 27.6 40.5 56.2 76.6

More Visualization

T-SNE

please refer to utils/visualization/t_SNE_Visualization.py for details.

Confusion_Matrix

please refer to utils/visualization/confusion_matrix.py for details.

Acknowledgement

This work is partly based on UEL and MoCo.

License

The code are released under the CC-BY-NC 4.0 LICENSE.

Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
Vignette is a face tracking software for characters using osu!framework.

Vignette is a face tracking software for characters using osu!framework. Unlike most solutions, Vignette is: Made with osu!framework, the game framewo

Vignette 412 Dec 28, 2022
Autonomous Robots Kalman Filters

Autonomous Robots Kalman Filters The Kalman Filter is an easy topic. However, ma

20 Jul 18, 2022
Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating

No RL No Simulation (NRNS) Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating NRNS is a heriarch

Meera Hahn 20 Nov 29, 2022
Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

DeepGeneAnnotator: A tool to annotate the gene in the genome The master thesis of the "Using deep learning to predict gene structures of the coding ge

Ching-Tien Wang 3 Sep 09, 2022
Codebase for testing whether hidden states of neural networks encode discrete structures.

structural-probes Codebase for testing whether hidden states of neural networks encode discrete structures. Based on the paper A Structural Probe for

John Hewitt 349 Dec 17, 2022
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
Pytorch and Keras Implementations of Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects.

The repository contains the implementations for Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects. Model

Ankur Deria 115 Jan 06, 2023
LERP : Label-dependent and event-guided interpretable disease risk prediction using EHRs

LERP : Label-dependent and event-guided interpretable disease risk prediction using EHRs This is the code for the LERP. Dataset The dataset used is MI

5 Jun 18, 2022
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 03, 2023
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
Graduation Project

Gesture-Detection-and-Depth-Estimation This is my graduation project. (1) In this project, I use the YOLOv3 object detection model to detect gesture i

ChaosAT 1 Nov 23, 2021
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
A Deep Learning Framework for Neural Derivative Hedging

NNHedge NNHedge is a PyTorch based framework for Neural Derivative Hedging. The following repository was implemented to ease the experiments of our pa

GUIJIN SON 17 Nov 14, 2022
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

NonCuboidRoom Paper Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiao

67 Dec 15, 2022
Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation By: Zayd Hammoudeh and Daniel Lowd Paper: Arxiv Preprint Coming soo

Zayd Hammoudeh 2 Oct 08, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 21 Nov 23, 2022
BBScan py3 - BBScan py3 With Python

BBScan_py3 This repository is forked from lijiejie/BBScan 1.5. I migrated the fo

baiyunfei 12 Dec 30, 2022
Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

AAVAE Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders" Abstract Recent methods for self-supervised learnin

Grid AI Labs 48 Dec 12, 2022
SAT Project - The first project I had done at General Assembly, performed EDA, data cleaning and created data visualizations

Project 1: Standardized Test Analysis by Adam Klesc Overview This project covers: Basic statistics and probability Many Python programming concepts Pr

Adam Muhammad Klesc 1 Jan 03, 2022