Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Overview

Temporal Segment Networks (TSN)

We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as other STOA frameworks for various tasks. We highly recommend you switch to it. This repo will keep on being suppported for Caffe users.

This repository holds the codes and models for the papers

Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018.

[Arxiv Preprint]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands.

[Arxiv Preprint]

News & Updates

Jul. 20, 2018 - For those having trouble building the TSN toolkit, we have provided a built docker image you can use. Download it from DockerHub. It contains OpenCV, Caffe, DenseFlow, and this codebase. All built and ready to use with NVIDIA-Docker

Sep. 8, 2017 - We released TSN models trained on the Kinetics dataset with 76.6% single model top-1 accuracy. Find the model weights and transfer learning experiment results on the website.

Aug 10, 2017 - An experimental pytorch implementation of TSN is released github

Nov. 5, 2016 - The project page for TSN is online. website

Sep. 14, 2016 - We fixed a legacy bug in Caffe. Some parameters in TSN training are affected. You are advised to update to the latest version.

FAQ, How to add a custom dataset

Below is the guidance to reproduce the reported results and explore more.

Contents


Usage Guide

Prerequisites

[back to top]

There are a few dependencies to run the code. The major libraries we use are

The codebase is written in Python. We recommend the Anaconda Python distribution. Matlab scripts are provided for some critical steps like video-level testing.

The most straightforward method to install these libraries is to run the build-all.sh script.

Besides software, GPU(s) are required for optical flow extraction and model training. Our Caffe modification supports highly efficient parallel training. Just throw in as many GPUs as you like and enjoy.

Code & Data Preparation

Get the code

[back to top]

Use git to clone this repository and its submodules

git clone --recursive https://github.com/yjxiong/temporal-segment-networks

Then run the building scripts to build the libraries.

bash build_all.sh

It will build Caffe and dense_flow. Since we need OpenCV to have Video IO, which is absent in most default installations, it will also download and build a local installation of OpenCV and use its Python interfaces.

Note that to run training with multiple GPUs, one needs to enable MPI support of Caffe. To do this, run

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

Get the videos

[back to top]

We experimented on two mainstream action recognition datasets: UCF-101 and HMDB51. Videos can be downloaded directly from their websites. After download, please extract the videos from the rar archives.

  • UCF101: the ucf101 videos are archived in the downloaded file. Please use unrar x UCF101.rar to extract the videos.
  • HMDB51: the HMDB51 video archive has two-level of packaging. The following commands illustrate how to extract the videos.
mkdir rars && mkdir videos
unrar x hmdb51-org.rar rars/
for a in $(ls rars); do unrar x "rars/${a}" videos/; done;

Get trained models

[back to top]

We provided the trained model weights in Caffe style, consisting of specifications in Protobuf messages, and model weights. In the codebase we provide the model spec for UCF101 and HMDB51. The model weights can be downloaded by running the script

bash scripts/get_reference_models.sh

Extract Frames and Optical Flow Images

[back to top]

To run the training and testing, we need to decompose the video into frames. Also the temporal stream networks need optical flow or warped optical flow images for input.

These can be achieved with the script scripts/extract_optical_flow.sh. The script has three arguments

  • SRC_FOLDER points to the folder where you put the video dataset
  • OUT_FOLDER points to the root folder where the extracted frames and optical images will be put in
  • NUM_WORKER specifies the number of GPU to use in parallel for flow extraction, must be larger than 1

The command for running optical flow extraction is as follows

bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER

It will take from several hours to several days to extract optical flows for the whole datasets, depending on the number of GPUs.

Testing Provided Models

Get reference models

[back to top]

To help reproduce the results reported in the paper, we provide reference models trained by us for instant testing. Please use the following command to get the reference models.

bash scripts/get_reference_models.sh

Video-level testing

[back to top]

We provide a Python framework to run the testing. For the benchmark datasets, we will measure average accuracy on the testing splits. We also provide the facility to analyze a single video.

Generally, to test on the benchmark dataset, we can use the scripts eval_net.py and eval_scores.py.

For example, to test the reference rgb stream model on split 1 of ucf 101 with 4 GPUs, run

python tools/eval_net.py ucf101 1 rgb FRAME_PATH \
 models/ucf101/tsn_bn_inception_rgb_deploy.prototxt models/ucf101_split_1_tsn_rgb_reference_bn_inception.caffemodel \
 --num_worker 4 --save_scores SCORE_FILE

where FRAME_PATH is the path you extracted the frames of UCF-101 to and SCORE_FILE is the filename to store the extracted scores.

One can also use cached score files to evaluate the performance. To do this, issue the following command

python tools/eval_scores.py SCORE_FILE

The more important function of eval_scores.py is to do modality fusion. For example, once we got the scores of rgb stream in RGB_SCORE_FILE and flow stream in FLOW_SCORE_FILE. The fusion result with weights of 1:1.5 can be achieved with

python tools/eval_scores.py RGB_SCORE_FILE FLOW_SCORE_FILE --score_weights 1 1.5

To view the full help message of these scripts, run python eval_net.py -h or python eval_scores.py -h.

Training Temporal Segment Networks

[back to top]

Training TSN is straightforward. We have provided the necessary model specs, solver configs, and initialization models. To achieve optimal training speed, we strongly advise you to turn on the parallel training support in the Caffe toolbox using following build command

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

where root path to openmpi installation points to the installation of the OpenMPI, for example /usr/local/openmpi/.

Construct file lists for training and validation

[back to top]

The data feeding in training relies on VideoDataLayer in Caffe. This layer uses a list file to specify its data sources. Each line of the list file will contain a tuple of extracted video frame path, video frame number, and video groundtruth class. A list file looks like

video_frame_path 100 10
video_2_frame_path 150 31
...

To build the file lists for all 3 splits of the two benchmark dataset, we have provided a script. Just use the following command

bash scripts/build_file_list.sh ucf101 FRAME_PATH

and

bash scripts/build_file_list.sh hmdb51 FRAME_PATH

The generated list files will be put in data/ with names like ucf101_flow_val_split_2.txt.

Get initialization models

[back to top]

We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run

bash scripts/get_init_models.sh

Start training

[back to top]

Once all necessities ready, we can start training TSN. For this, use the script scripts/train_tsn.sh. For example, the following command runs training on UCF101 with rgb input

bash scripts/train_tsn.sh ucf101 rgb

the training will run with default settings on 4 GPUs. Usually, it takes around 1 hours to train the rgb model and 4 hours for flow models, on 4 GTX Titan X GPUs.

The learned model weights will be saved in models/. The aforementioned testing process can be used to evaluate them.

Config the training process

[back to top]

Here we provide some information on customizing the training process

  • Change split: By default, the training is conducted on split 1 of the datasets. To change the split, one can modify corresponding model specs and solver files. For example, to train on split 2 of UCF101 with rgb input, one can modify the file models/ucf101/tsn_bn_inception_rgb_train_val.prototxt. On line 8, change
source: "data/ucf101_rgb_train_split_1.txt"`

to

`source: "data/ucf101_rgb_train_split_2.txt"`

On line 34, change

source: "data/ucf101_rgb_val_split_1.txt"

to

source: "data/ucf101_rgb_val_split_2.txt"

Also, in the solver file models/ucf101/tsn_bn_inception_rgb_solver.prototxt, on line 12 change

snapshot_prefix: "models/ucf101_split1_tsn_rgb_bn_inception"

to

snapshot_prefix: "models/ucf101_split2_tsn_rgb_bn_inception"

in order to distiguish the learned weights.

  • Change GPU number, in general, one can use any number of GPU to do the training. To use more or less GPU, one can change the N_GPU in scripts/train_tsn.sh. Important notice: when the GPU number is changed, the effective batchsize is also changed. It's better to always make sure the effective batchsize, which equals to batch_size*iter_size*n_gpu, to be 128. Here, batch_size is the number in the model's prototxt, for example line 9 in models/ucf101/tsn_bn_inception_rgb_train_val.protoxt.

Other Info

[back to top]

Citation

Please cite the following paper if you feel this repository useful.

@inproceedings{TSN2016ECCV,
  author    = {Limin Wang and
               Yuanjun Xiong and
               Zhe Wang and
               Yu Qiao and
               Dahua Lin and
               Xiaoou Tang and
               Luc {Val Gool}},
  title     = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
  booktitle   = {ECCV},
  year      = {2016},
}

Related Projects

Contact

For any question, please contact

Yuanjun Xiong: [email protected]
Limin Wang: [email protected]
Owner
Young and simple. [email protected] -> Amazon Rekognition. We are hiring summer interns for 20
Equivariant GNN for the prediction of atomic multipoles up to quadrupoles.

Equivariant Graph Neural Network for Atomic Multipoles Description Repository for the Model used in the publication 'Learning Atomic Multipoles: Predi

16 Nov 22, 2022
Multi-query Video Retreival

Multi-query Video Retreival

Princeton Visual AI Lab 17 Nov 22, 2022
A community run, 5-day PyTorch Deep Learning Bootcamp

Deep Learning Winter School, November 2107. Tel Aviv Deep Learning Bootcamp : http://deep-ml.com. About Tel-Aviv Deep Learning Bootcamp is an intensiv

Shlomo Kashani. 1.3k Sep 04, 2021
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields Paper | Supplementary | Video | Poster If you find our code or paper useful, please

26 Nov 29, 2022
Zalo AI challenge 2021 task hum to song

Zalo AI challenge 2021 task Hum to Song pipeline: Chuẩn bị dữ liệu cho quá trình train: Sửa các file đường dẫn trong config/preprocess.yaml raw_path:

Vo Van Phuc 105 Dec 16, 2022
Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction This repository contains the code for the p

Sven 30 Jan 05, 2023
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Semisupervised Multitask Learning This repository is an unofficial and slightly modified implementation of UM-Adapt[1] using PyTorch. This code primar

Abhinav Atrishi 11 Nov 25, 2022
This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A Workbook for the Qiskit Developer Certification Exam Hello everyone! This is Bartu, a fellow Qiskitter. I have recently taken the Certification exam

Bartu Bisgin 66 Dec 10, 2022
An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine

A Gentle Introduction to Satellite Image Processing Welcome to this introductory course on Satellite Image Analysis! Satellite imagery has become a pr

Edward Oughton 32 Jan 03, 2023
DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

CASIA-IVA-Lab 111 Dec 21, 2022
Attentional Focus Modulates Automatic Finger‑tapping Movements

"Attentional Focus Modulates Automatic Finger‑tapping Movements", in Scientific Reports

Xingxun Jiang 1 Dec 02, 2021
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 102 Jan 05, 2023
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs This is the official code for Towards Multi-Grained Explainability for Graph Neural Networks (NeurIPS 20

Shirley (Ying-Xin) Wu 47 Dec 16, 2022
HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands Oral Presentation, 3DV 2021 Korrawe Karunratanakul, Adrian Spurr, Zicong

Korrawe Karunratanakul 43 Oct 07, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

基于 bert4keras 的一个baseline 不作任何 数据trick 单模 线上 最高可到 0.7891 # 基础 版 train.py 0.7769 # transformer 各层 cls concat 明神的trick https://xv44586.git

孙永松 7 Dec 28, 2021
Solve a Rubiks Cube using Python Opencv and Kociemba module

Rubiks_Cube_Solver Solve a Rubiks Cube using Python Opencv and Kociemba module Main Steps Get the countours of the cube check whether there are tota

Adarsh Badagala 176 Jan 01, 2023