TDN: Temporal Difference Networks for Efficient Action Recognition

Overview

TDN: Temporal Difference Networks for Efficient Action Recognition

1

Overview

We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py and ops/tdn_net.py.

🔥 [NEW!] We have released the PyTorch code of TDN.

Prerequisites

The code is built with following libraries:

Data Preparation

We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.

  • The processing of Something-Something-V1 & V2 can be summarized into 3 steps:

    1. Extract frames from videos(you can use ffmpeg to get frames from video)
    2. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1 num_frames label_1
      frames/video_2 num_frames label_2
      frames/video_3 num_frames label_3
      ...
      frames/video_N num_frames label_N
      
    3. Add the information to ops/dataset_configs.py
  • The processing of Kinetics400 can be summarized into 2 steps:

    1. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1.mp4  label_1
      frames/video_2.mp4  label_2
      frames/video_3.mp4  label_3
      ...
      frames/video_N.mp4  label_N
      
    2. Add the information to ops/dataset_configs.py

Model Zoo

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Something-Something-V1

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 52.3% 80.6% link
TDN-ResNet50 16x1x1 53.9% 82.1% link

Something-Something-V2

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 64.0% 88.8% link
TDN-ResNet50 16x1x1 65.3% 89.7% link

Kinetics400

Model Frames x Crops x Clips Top-1 (30 view) Top-5 (30 view) checkpoint
TDN-ResNet50 8x3x10 76.6% 92.8% link
TDN-ResNet50 16x3x10 77.5% 93.2% link
TDN-ResNet101 8x3x10 77.5% 93.6% link
TDN-ResNet101 16x3x10 78.5% 93.9% link

Testing

  • For center crop single clip, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts:
      CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
      --archs='resnet50' --weights   --test_segments=8  \
      --test_crops=1 --batch_size=16  --gpus 0 --output_dir  -j 4 --clip_index=1
      
    2. Run the following scripts to get result from the raw score:
      python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir   
      
  • For 3 crops, 10 clips, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts for 10 times(clip_index from 0 to 9):
      CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py  kinetics \
      --archs='resnet50' --weights   --test_segments=8 \
      --test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir   \
      -j 4 --clip_index 
      
    2. Run the following scripts to ensemble the raw score of the 30 views:
      python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir  
      

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

  • For example, to train TDN-ResNet50 on Something-Something-V1 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
                main.py  something  RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
                --lr_scheduler step --lr_steps  30 45 55 --epochs 60 --batch-size 16 \
                --wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    
  • For example, to train TDN-ResNet50 on Kinetics400 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
            main.py  kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
            --lr_scheduler step  --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
            --wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    

Acknowledgements

We especially thank the contributors of the TSN and TSM codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@article{wang2020tdn,
      title={TDN: Temporal Difference Networks for Efficient Action Recognition}, 
      author={Limin Wang and Zhan Tong and Bin Ji and Gangshan Wu},
      journal={arXiv preprint arXiv:2012.10071},
      year={2020}
}
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

Tom Xu 1 Jan 12, 2022
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

Maxime Burchi 145 Dec 30, 2022
you can add any codes in any language by creating its respective folder (if already not available).

HACKTOBERFEST-2021-WEB-DEV Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k21 ? come on in About This is repository of Responsive Portfo

Suman Sharma 8 Oct 17, 2022
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 03, 2023
A rule learning algorithm for the deduction of syndrome definitions from time series data.

README This project provides a rule learning algorithm for the deduction of syndrome definitions from time series data. Large parts of the algorithm a

0 Sep 24, 2021
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023
This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

Attention-Guided-Contextual-Feature-Fusion-Network-for-Salient-Object-Detection This repo. is an implementation of ACFFNet, which is accepted for in I

5 Nov 21, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

82 Dec 15, 2022
A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

CCasGNN A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics,

5 Apr 29, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Koi-Koi AI Learning based AI for playing multi-round Koi-Koi hanafuda card games. Platform Python PyTorch PySimpleGUI (for the interface playing vs AI

Sanghai Guan 10 Nov 20, 2022
Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

William Falcon 141 Dec 30, 2022
FS-Mol: A Few-Shot Learning Dataset of Molecules

FS-Mol is A Few-Shot Learning Dataset of Molecules, containing molecular compounds with measurements of activity against a variety of protein targets. The dataset is presented with a model evaluation

Microsoft 114 Dec 15, 2022
Clockwork Convnets for Video Semantic Segmentation

Clockwork Convnets for Video Semantic Segmentation This is the reference implementation of arxiv:1608.03609: Clockwork Convnets for Video Semantic Seg

Evan Shelhamer 141 Nov 21, 2022
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

RR_Inyo 3 Sep 23, 2022
CONetV2: Efficient Auto-Channel Size Optimization for CNNs

CONetV2: Efficient Auto-Channel Size Optimization for CNNs Exciting News! CONetV2: Efficient Auto-Channel Size Optimization for CNNs has been accepted

Mahdi S. Hosseini 3 Dec 13, 2021
Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

Locally-Shifted-Attention-With-Early-Global-Integration Pretrained models You can download all the models from here. Training Imagenet python -m torch

Shelly Sheynin 14 Apr 15, 2022
PROJECT - Az Residential Real Estate Analysis

AZ RESIDENTIAL REAL ESTATE ANALYSIS -Decided on libraries to import. Includes pa

2 Jul 05, 2022