CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Overview

Temporal Context Aggregation Network - Pytorch

This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal Action Proposal Refinement", which is accepted in CVPR 2021.

[Arxiv Preprint]

Update

  • 2021.07.02: Update proposals, checkpoints, features for TCANet!
  • 2021.05.31: Repository for TCANet

Contents

Paper Introduction

image

Temporal action proposal generation aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet important task in the video understanding field. The proposals generated by current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval owing to the lack of efficient temporal modeling and effective boundary context utilization. In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement. Specifically, we first design a Local-Global Temporal Encoder (LGTE), which adopts the channel grouping strategy to efficiently encode both "local and global" temporal inter-dependencies. Furthermore, both the boundary and internal context of proposals are adopted for frame-level and segment-level boundary regressions, respectively. Temporal Boundary Regressor (TBR) is designed to combine these two regression granularities in an end-to-end fashion, which achieves the precise boundaries and reliable confidence of proposals through progressive refinement. Extensive experiments are conducted on three challenging datasets: HACS, ActivityNet-v1.3, and THUMOS-14, where TCANet can generate proposals with high precision and recall. By combining with the existing action classifier, TCANet can obtain remarkable temporal action detection performance compared with other methods. Not surprisingly, the proposed TCANet won the 1st place in the CVPR 2020 - HACS challenge leaderboard on temporal action localization task.

Prerequisites

These code is implemented in Pytorch 1.5.1 + Python3.

Code and Data Preparation

Get the code

Clone this repo with git, please use:

git clone https://github.com/qingzhiwu/Temporal-Context-Aggregation-Network-Pytorch.git

Download Datasets

We support experiments with publicly available dataset HACS for temporal action proposal generation now. To download this dataset, please use official HACS downloader to download videos from the YouTube.

To extract visual feature, we adopt Slowfast model pretrained on the training set of HACS. Please refer this repo Slowfast to extract features.

For convenience of training and testing, we provide the rescaled feature at here Google Cloud or Baidu Yun[Code:x3ve].

In Baidu Yun Link, we provide:

-- features/: SlowFast features for training, validation and testing.
-- checkpoint/: Pre-trained TCANet model for SlowFast features provided by us.
-- proposals/: BMN proposals processed by us.
-- classification/: The best classification results we used in paper and 2020 HACS challenge.

Training and Testing of TCANet

All configurations of TCANet are saved in opts.py, where you can modify training and model parameter.

1. Unzip Proposals

tar -jxvf hacs.bmn.pem.slowfast101.t200.wd1e-5.warmup.pem_input_100.tar.bz2 -C ./
tar -jxvf hacs.bmn.pem.slowfast101.t200.wd1e-5.warmup.pem_input.tar.bz2 -C ./

2. Unzip Features

# for training features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.bz2 -C .

# for validation features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.bz2 -C .

# for testing features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.bz2 -C .

4. Training of TCANet

python3 main_tcanet.py --mode train \
--checkpoint_path ./checkpoint/ \
--video_anno /path/to/HACS_segments_v1.1.1.json \
--feature_path /path/to/feature/ \
--train_proposals_path /path/to/pem_input_100/in/proposals \ 
--test_proposals_path /path/to/pem_input/in/proposals 

We also provide trained TCANet model in ./checkpoint in our BaiduYun Link.

6. Testing of TCANet

# We split the dataset into 4 parts, and inference these parts on 4 gpus
python3 main_tcanet.py  --mode inference --part_idx 0 --gpu 0 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py  --mode inference --part_idx 1 --gpu 1 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py  --mode inference --part_idx 2 --gpu 2 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py  --mode inference --part_idx 3 --gpu 3 --classifier_result /path/to/classifier/{}94.32.json

7. Post processing and generate final results

python3 main_tcanet.py  --mode inference --part_idx -1

Other Info

Citation

Please cite the following paper if you feel TCANet useful to your research

@inproceedings{qing2021temporal,
  title={Temporal Context Aggregation Network for Temporal Action Proposal Refinement},
  author={Qing, Zhiwu and Su, Haisheng and Gan, Weihao and Wang, Dongliang and Wu, Wei and Wang, Xiang and Qiao, Yu and Yan, Junjie and Gao, Changxin and Sang, Nong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={485--494},
  year={2021}
}

Contact

For any question, please file an issue or contact

Zhiwu Qing: [email protected]
Owner
Zhiwu Qing
Zhiwu Qing
Image-to-Image Translation in PyTorch

CycleGAN and pix2pix in PyTorch New: Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that e

Jun-Yan Zhu 19k Jan 07, 2023
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 02, 2022
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

浅梦 828 Jan 04, 2023
AI Flow is an open source framework that bridges big data and artificial intelligence.

Flink AI Flow Introduction Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine

144 Dec 30, 2022
All materials of Cassandra Event, Udyam'22

Cassandra 2022 Workspace Workshop Materials Workshop-1 Workshop-2 Workshop-3 Workshop-4 Assignments Assignment-1 Assignment-2 Assignment-3 Resources P

36 Dec 31, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
MIMO-UNet - Official Pytorch Implementation

MIMO-UNet - Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Rethinking Coarse-to-

Sungjin Cho 248 Jan 02, 2023
KoCLIP: Korean port of OpenAI CLIP, in Flax

KoCLIP This repository contains code for KoCLIP, a Korean port of OpenAI's CLIP. This project was conducted as part of Hugging Face's Flax/JAX communi

Jake Tae 100 Jan 02, 2023
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Pytorch Implementation for DisCo: Remedy Self-supervi

79 Jan 06, 2023
Learning High-Speed Flight in the Wild

Learning High-Speed Flight in the Wild This repo contains the code associated to the paper Learning Agile Flight in the Wild. For more information, pl

Robotics and Perception Group 391 Dec 29, 2022
The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

cmc 6 Dec 04, 2022
Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

VITA lab at EPFL 183 Jan 05, 2023
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 463 Dec 09, 2022
CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019