Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Last update: Dec 07, 2022

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

Requirements

Python: 3.7.9
Pytorch: 1.6.0
TorchVision: 0.2.1
Cuda: 10.1
Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh 
    
    
     
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh 
      
      
       
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh 
    
     
     
      
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh 
       
        
        
          # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model	dataset	frames	top-1 / top-5	logs	checkpoints
RSANet-R50	SSV1	16	54.0 % / 81.1 %	[log]	[checkpoint]
RSANet-R50	SSV2	16	66.0 % / 89.9 %	[log]	[checkpoint]

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

Affine / perspective transformation in Pose Estimation with Tensorflow 2

A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

🕺Full body detection and tracking

Implementation of Sequence Generative Adversarial Nets with Policy Gradient

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

Acute ischemic stroke dataset

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

Transformer based SAR image despeckling

OpenMMLab Image and Video Editing Toolbox

SBINN: Systems-biology informed neural network

TF Image Segmentation: Image Segmentation framework

Release of the ConditionalQA dataset

Earthquake detection via fiber optic cables using deep learning

Heart Arrhythmia Classification

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

A curated list of neural rendering resources.

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.