Video Contrastive Learning with Global Context

Overview

Video Contrastive Learning with Global Context (VCLR)

This is the official PyTorch implementation of our VCLR paper.

Install dependencies

  • environments
    conda create --name vclr python=3.7
    conda activate vclr
    conda install numpy scipy scikit-learn matplotlib scikit-image
    pip install torch==1.7.1 torchvision==0.8.2
    pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2
    pip install mmcv-full==1.2.7

Prepare datasets

Please refer to PREPARE_DATA to prepare the datasets.

Prepare pretrained MoCo weights

In this work, we follow SeCo and use the pretrained weights of MoCov2 as initialization.

cd ~
git clone https://github.com/amazon-research/video-contrastive-learning.git
cd video-contrastive-learning
mkdir pretrain && cd pretrain
wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar
cd ..

Self-supervised pretraining

bash shell/main_train.sh

Checkpoints will be saved to ./results

Downstream tasks

Linear evaluation

In order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.

bash shell/eval_svm.sh
  • Results

    Arch Pretrained dataset Epoch Pretrained model Acc. on K400
    ResNet50 Kinetics400 400 Download link 64.1

Video retrieval

bash shell/eval_retrieval.sh

Action recognition & action localization

Here, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the official documentation.

Installation

  • Step1: Install mmaction2

    To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):

    conda activate vclr
    cd ~
    git clone https://github.com/KuangHaofei/mmaction2
    
    cd mmaction2
    pip install -v -e .
  • Step2: Prepare the pretrained weights

    Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, TSN and TSM. We also provide the script for transferring weights, you can find it here.

    Moving the pretrained weights to checkpoints directory:

    cd ~/mmaction2
    mkdir checkpoints
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth

Action recognition

Make sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of mmaction2, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.

For each dataset, the train and test setting can be found in the configuration files.

  • UCF101

    • config file: tsn_ucf101.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \
        work_dirs/vclr/ucf101/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • HMDB51

    • config file: tsn_hmdb51.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \
        work_dirs/vclr/hmdb51/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSN

    • config file: tsn_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \
        work_dirs/vclr/tsn_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSM

    • config file: tsm_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \
        work_dirs/vclr/tsm_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • ActivityNet

    • config file: tsn_activitynet.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \
        work_dirs/vclr/tsn_activitynet/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • Results

    Arch Dataset Finetuned model Acc.
    TSN UCF101 Download link 85.6
    TSN HMDB51 Download link 54.1
    TSN SomethingSomethingV2 Download link 33.3
    TSM SomethingSomethingV2 Download link 52.0
    TSN ActivityNet Download link 71.9

Action localization

  • Step 1: Follow the previous section, suppose the finetuned model is saved at work_dirs/vclr/tsn_activitynet/latest.pth

  • Step 2: Extract ActivityNet features

    cd ~/mmaction2/tools/data/activitynet/
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python activitynet_feature_postprocessing.py \
      --rgb /home/ubuntu/data/ActivityNet/rgb_feat \
      --dest /home/ubuntu/data/ActivityNet/mmaction_feat

    Note, the root directory of ActivityNey is /home/ubuntu/data/ActivityNet/ in our case. Please replace it according to your real directory.

  • Step 3: Train and test the BMN model

    • train
      cd ~/mmaction2
      ./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \
        --work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn
    • test
      python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \
        work_dirs/vclr/bmn_activitynet/latest.pth \
        --bmn --eval [email protected] --out result.json
  • Results

    Arch Dataset Finetuned model AUC [email protected]
    BMN ActivityNet Download link 65.5 73.8

Feature visualization

We provide our feature visualization code at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

AttentionGAN-v2 for Unpaired Image-to-Image Translation AttentionGAN-v2 Framework The proposed generator learns both foreground and background attenti

Hao Tang 530 Dec 27, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
This is the implementation of the paper "Self-supervised Outdoor Scene Relighting"

Self-supervised Outdoor Scene Relighting This is the implementation of the paper "Self-supervised Outdoor Scene Relighting". The model is implemented

Ye Yu 24 Dec 17, 2022
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

Phil Wang 146 Dec 06, 2022
Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

Intelligent Systems Lab Org 1.3k Jan 02, 2023
A Novel Plug-in Module for Fine-grained Visual Classification

Pytorch implementation for A Novel Plug-in Module for Fine-Grained Visual Classification. fine-grained visual classification task.

ChouPoYung 109 Dec 20, 2022
GNN-based Recommendation Benchmark

GRecX A Fair Benchmark for GNN-based Recommendation Homepage and Documentation Homepage: Documentation: Paper: GRecX: An Efficient and Unified Benchma

73 Oct 17, 2022
MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

Documentation: https://mmgeneration.readthedocs.io/ Introduction English | 简体中文 MMGeneration is a powerful toolkit for generative models, especially f

OpenMMLab 1.3k Dec 29, 2022
Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

Michael Nielsen 13.9k Dec 26, 2022
Attention-guided gan for synthesizing IR images

SI-AGAN Attention-guided gan for synthesizing IR images This repository contains the Tensorflow code for "Pedestrian Gender Recognition by Style Trans

1 Oct 25, 2021
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 24 Nov 01, 2022
A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators The authors are hidden for the purpose of double blind

77 Dec 16, 2022
JAXDL: JAX (Flax) Deep Learning Library

JAXDL: JAX (Flax) Deep Learning Library Simple and clean JAX/Flax deep learning algorithm implementations: Soft-Actor-Critic (arXiv:1812.05905) Transf

Patrick Hart 4 Nov 27, 2022
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

Erland 127 Dec 23, 2022
Single/multi view image(s) to voxel reconstruction using a recurrent neural network

3D-R2N2: 3D Recurrent Reconstruction Neural Network This repository contains the source codes for the paper Choy et al., 3D-R2N2: A Unified Approach f

Chris Choy 1.2k Dec 27, 2022
Benchmarks for Model-Based Optimization

Design-Bench Design-Bench is a benchmarking framework for solving automatic design problems that involve choosing an input that maximizes a black-box

Brandon Trabucco 43 Dec 20, 2022
abess: Fast Best-Subset Selection in Python and R

abess: Fast Best-Subset Selection in Python and R Overview abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection,

297 Dec 21, 2022
Pytorch implementation of MixNMatch

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation [Paper] Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Le

910 Dec 30, 2022
COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

COVID-ViT COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models This code is to response to te MIA-COV19 compe

17 Dec 30, 2022
LBK 35 Dec 26, 2022