TDN: Temporal Difference Networks for Efficient Action Recognition

Last update: Dec 13, 2022

Overview

TDN: Temporal Difference Networks for Efficient Action Recognition

Overview

We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py and ops/tdn_net.py.

🔥 [NEW!] We have released the PyTorch code of TDN.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

The code is built with following libraries:

Python 3.6 or higher
PyTorch 1.4 or higher
Torchvision
TensorboardX
tqdm
scikit-learn
ffmpeg
decord

Data Preparation

We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.

The processing of Something-Something-V1 & V2 can be summarized into 3 steps:
1. Extract frames from videos(you can use ffmpeg to get frames from video)
2. Generate annotations needed for dataloader ("
  " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
  frames/video_1 num_frames label_1 frames/video_2 num_frames label_2 frames/video_3 num_frames label_3 ... frames/video_N num_frames label_N
3. Add the information to ops/dataset_configs.py
The processing of Kinetics400 can be summarized into 2 steps:
1. Generate annotations needed for dataloader ("
  " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
```
frames/video_1.mp4  label_1
frames/video_2.mp4  label_2
frames/video_3.mp4  label_3
...
frames/video_N.mp4  label_N
```
2. Add the information to ops/dataset_configs.py

Model Zoo

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Something-Something-V1

Model	Frames x Crops x Clips	Top-1	Top-5	checkpoint
TDN-ResNet50	8x1x1	52.3%	80.6%	link
TDN-ResNet50	16x1x1	53.9%	82.1%	link

Something-Something-V2

Model	Frames x Crops x Clips	Top-1	Top-5	checkpoint
TDN-ResNet50	8x1x1	64.0%	88.8%	link
TDN-ResNet50	16x1x1	65.3%	89.7%	link

Kinetics400

Model	Frames x Crops x Clips	Top-1 (30 view)	Top-5 (30 view)	checkpoint
TDN-ResNet50	8x3x10	76.6%	92.8%	link
TDN-ResNet50	16x3x10	77.5%	93.2%	link
TDN-ResNet101	8x3x10	77.5%	93.6%	link
TDN-ResNet101	16x3x10	78.5%	93.9%	link

Testing

For center crop single clip, the processing of testing can be summarized into 2 steps:

Run the following testing scripts:

CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
--archs='resnet50' --weights   --test_segments=8  \
--test_crops=1 --batch_size=16  --gpus 0 --output_dir  -j 4 --clip_index=1

Run the following scripts to get result from the raw score:

python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir

For 3 crops, 10 clips, the processing of testing can be summarized into 2 steps:

Run the following testing scripts for 10 times(clip_index from 0 to 9):

CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py  kinetics \
--archs='resnet50' --weights   --test_segments=8 \
--test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir   \
-j 4 --clip_index

Run the following scripts to ensemble the raw score of the 30 views:

python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train TDN-ResNet50 on Something-Something-V1 with 8 gpus, you can run:

python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
            main.py  something  RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
            --lr_scheduler step --lr_steps  30 45 55 --epochs 60 --batch-size 16 \
            --wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb

For example, to train TDN-ResNet50 on Kinetics400 with 8 gpus, you can run:

python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
        main.py  kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
        --lr_scheduler step  --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
        --wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb

Acknowledgements

We especially thank the contributors of the TSN and TSM codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@article{wang2020tdn,
      title={TDN: Temporal Difference Networks for Efficient Action Recognition}, 
      author={Limin Wang and Zhan Tong and Bin Ji and Gangshan Wu},
      journal={arXiv preprint arXiv:2012.10071},
      year={2020}
}

TDN: Temporal Difference Networks for Efficient Action Recognition

Related tags

Overview

TDN: Temporal Difference Networks for Efficient Action Recognition

Overview

Prerequisites

Data Preparation

Model Zoo

Something-Something-V1

Something-Something-V2

Kinetics400

Testing

Training

Acknowledgements

License

Citation

Owner

Multimedia Computing Group, Nanjing University

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

you can add any codes in any language by creating its respective folder (if already not available).

Segmentation models with pretrained backbones. PyTorch.

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

A rule learning algorithm for the deduction of syndrome definitions from time series data.

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

FS-Mol: A Few-Shot Learning Dataset of Molecules

Clockwork Convnets for Video Semantic Segmentation

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

PROJECT - Az Residential Real Estate Analysis