NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

Overview

NExT-QA

We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2 'Weak Accept's).

NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal and temporal actions and understand the rich object interactions in daily activities. We set up both multi-choice and open-ended QA tasks on the dataset. This repo. provides resources for multi-choice QA; open-ended QA is found in NExT-OE. For more details, please refer to our dataset page.

Environment

Anaconda 4.8.4, python 3.6.8, pytorch 1.6 and cuda 10.2. For other libs, please refer to the file requirements.txt.

Install

Please create an env for this project using anaconda (should install anaconda first)

>conda create -n videoqa python=3.6.8
>conda activate videoqa
>git clone https://github.com/doc-doc/NExT-QA.git
>pip install -r requirements.txt #may take some time to install

Data Preparation

Please download the pre-computed features and QA annotations from here. There are 4 zip files:

  • ['vid_feat.zip']: Appearance and motion feature for video representation. (With code provided by HCRN).
  • ['qas_bert.zip']: Finetuned BERT feature for QA-pair representation. (Based on pytorch-pretrained-BERT).
  • ['nextqa.zip']: Annotations of QAs and GloVe Embeddings.
  • ['models.zip']: Learned HGA model.

After downloading the data, please create a folder ['data/feats'] at the same directory as ['NExT-QA'], then unzip the video and QA features into it. You will have directories like ['data/feats/vid_feat/', 'data/feats/qas_bert/' and 'NExT-QA/'] in your workspace. Please unzip the files in ['nextqa.zip'] into ['NExT-QA/dataset/nextqa'] and ['models.zip'] into ['NExT-QA/models/'].

(You are also encouraged to design your own pre-computed video features. In that case, please download the raw videos from VidOR. As NExT-QA's videos are sourced from VidOR, you can easily link the QA annotations with the corresponding videos according to the key 'video' in the ['nextqa/.csv'] files, during which you may need the map file ['nextqa/map_vid_vidorID.json']).

Usage

Once the data is ready, you can easily run the code. First, to test the environment and code, we provide the prediction and model of the SOTA approach (i.e., HGA) on NExT-QA. You can get the results reported in the paper by running:

>python eval_mc.py

The command above will load the prediction file under ['results/'] and evaluate it. You can also obtain the prediction by running:

>./main.sh 0 val #Test the model with GPU id 0

The command above will load the model under ['models/'] and generate the prediction file. If you want to train the model, please run

>./main.sh 0 train # Train the model with GPU id 0

It will train the model and save to ['models']. (The results may be slightly different depending on the environments)

Results

Methods Text Rep. Acc_C Acc_T Acc_D Acc Text Rep. Acc_C Acc_T Acc_D Acc
BlindQA GloVe 26.89 30.83 32.60 30.60 BERT-FT 42.62 45.53 43.89 43.76
EVQA GloVe 28.69 31.27 41.44 31.51 BERT-FT 42.64 46.34 45.82 44.24
STVQA [CVPR17] GloVe 36.25 36.29 55.21 39.21 BERT-FT 44.76 49.26 55.86 47.94
CoMem [CVPR18] GloVe 35.10 37.28 50.45 38.19 BERT-FT 45.22 49.07 55.34 48.04
HME [CVPR19] GloVe 37.97 36.91 51.87 39.79 BERT-FT 46.18 48.20 58.30 48.72
HCRN [CVPR20] GloVe 39.09 40.01 49.16 40.95 BERT-FT 45.91 49.26 53.67 48.20
HGA [AAAI20] GloVe 35.71 38.40 55.60 39.67 BERT-FT 46.26 50.74 59.33 49.74
Human - 87.61 88.56 90.40 88.38 - 87.61 88.56 90.40 88.38

Multi-choice QA vs. Open-ended QA

vis mc_oe

Citation

@article{xiao2021next,
  title={NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions},
  author={Xiao, Junbin and Shang, Xindi and Yao, Angela and Chua, Tat-Seng},
  journal={arXiv preprint arXiv:2105.08276},
  year={2021}
}

Todo

  1. Open evaluation server and release test data.
  2. Release spatial feature.
  3. Release RoI feature.

Acknowledgement

Our reproduction of the methods are based on the respective official repositories, we thank the authors to release their code. If you use the related part, please cite the corresponding paper commented in the code.

Owner
Junbin Xiao
PhD Candidate
Junbin Xiao
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 04, 2020
This is the code for HOI Transformer

HOI Transformer Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer. Reproduction We recomend you to

BigBangEpoch 124 Dec 29, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
Time should be taken seer-iously

TimeSeers seers - (Noun) plural form of seer - A person who foretells future events by or as if by supernatural means TimeSeers is an hierarchical Bay

279 Dec 26, 2022
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Junxian He 57 Jan 01, 2023
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 06, 2023
LSTM built using Keras Python package to predict time series steps and sequences. Includes sin wave and stock market data

LSTM Neural Network for Time Series Prediction LSTM built using the Keras Python package to predict time series steps and sequences. Includes sine wav

Jakob Aungiers 4.1k Jan 02, 2023
PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Introduction This repo contains the official PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. Up

133 Dec 29, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 06, 2022
System Combination for Grammatical Error Correction Based on Integer Programming

System Combination for Grammatical Error Correction Based on Integer Programming This repository contains the code and scripts that implement the syst

NUS NLP Group 0 Mar 29, 2022
Trajectory Variational Autoencder baseline for Multi-Agent Behavior challenge 2022

MABe_2022_TVAE: a Trajectory Variational Autoencoder baseline for the 2022 Multi-Agent Behavior challenge This repository contains jupyter notebooks t

Andrew Ulmer 15 Nov 08, 2022
Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

CompVis Heidelberg 293 Dec 22, 2022
Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

1 Meta-FDMIxup Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data. (ACM MM 2021) paper News! the rep

Fu Yuqian 44 Nov 18, 2022
FaRL for Facial Representation Learning

FaRL for Facial Representation Learning This repo hosts official implementation of our paper General Facial Representation Learning in a Visual-Lingui

Microsoft 19 Jan 05, 2022
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

DiagonalGAN Official Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Trans

32 Dec 06, 2022
Full Resolution Residual Networks for Semantic Image Segmentation

Full-Resolution Residual Networks (FRRN) This repository contains code to train and qualitatively evaluate Full-Resolution Residual Networks (FRRNs) a

Toby Pohlen 274 Oct 27, 2022
An open source object detection toolbox based on PyTorch

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

Bo Chen 24 Dec 28, 2022