PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Overview

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning" by Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, and Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. The memory module generates a highly summarized memory state from the video segments and the sentence history so as to help better prediction of the next sentence (w.r.t. coreference and repetition aspects), thus encouraging coherent paragraph generation. Extensive experiments, human evaluations, and qualitative analyses on two popular datasets ActivityNet Captions and YouCookII show that MART generates more coherent and less repetitive paragraph captions than baseline methods, while maintaining relevance to the input video events.

Related works:

Getting started

Prerequisites

  1. Clone this repository
# no need to add --recursive as all dependencies are copied into this repo.
git clone https://github.com/jayleicn/recurrent-transformer.git
cd recurrent-transformer
  1. Prepare feature files

Download features from Google Drive: rt_anet_feat.tar.gz (39GB) and rt_yc2_feat.tar.gz (12GB). These features are repacked from features provided by densecap.

mkdir video_feature && cd video_feature
tar -xf path/to/rt_anet_feat.tar.gz 
tar -xf path/to/rt_yc2_feat.tar.gz 
  1. Install dependencies
  • Python 2.7
  • PyTorch 1.1.0
  • nltk
  • easydict
  • tqdm
  • tensorboardX
  1. Add project root to PYTHONPATH
source setup.sh

Note that you need to do this each time you start a new session.

Training and Inference

We give examples on how to perform training and inference with MART.

  1. Build Vocabulary
bash scripts/build_vocab.sh DATASET_NAME

DATASET_NAME can be anet for ActivityNet Captions or yc2 for YouCookII.

  1. MART training

The general training command is:

bash scripts/train.sh DATASET_NAME MODEL_TYPE

MODEL_TYPE can be one of [mart, xl, xlrg, mtrans, mart_no_recurrence], see details below.

MODEL_TYPE Description
mart Memory Augmented Recurrent Transformer
xl Transformer-XL
xlrg Transformer-XL with recurrent gradient
mtrans Vanilla Transformer
mart_no_recurrence mart with recurrence disabled

To train our MART model on ActivityNet Captions:

bash scripts/train.sh anet mart

Training log and model will be saved at results/anet_re_*.
Once you have a trained model, you can follow the instructions below to generate captions.

  1. Generate captions
bash scripts/translate_greedy.sh anet_re_* val

Replace anet_re_* with your own model directory name. The generated captions are saved at results/anet_re_*/greedy_pred_val.json

  1. Evaluate generated captions
bash scripts/eval.sh anet val results/anet_re_*/greedy_pred_val.json

The results should be comparable with the results we present at Table 2 of the paper. E.g., [email protected] 10.33; [email protected] 5.18.

Citations

If you find this code useful for your research, please cite our paper:

@inproceedings{lei2020mart,
  title={MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning},
  author={Lei, Jie and Wang, Liwei and Shen, Yelong and Yu, Dong and Berg, Tamara L and Bansal, Mohit},
  booktitle={ACL},
  year={2020}
}

Others

This code used resources from the following projects: transformers, transformer-xl, densecap, OpenNMT-py.

Contact

jielei [at] cs.unc.edu

Owner
Jie Lei 雷杰
UNC CS PhD student, vision+language.
Jie Lei 雷杰
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

1 Nov 01, 2021
PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

A ready-to-use framework of latest models for structured (tabular) data learning with PyTorch. Applications include recommendation, CRT prediction, healthcare analytics, and etc.

48 Nov 30, 2022
A Keras implementation of CapsNet in the paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules

NOTE This implementation is fork of https://github.com/XifengGuo/CapsNet-Keras , applied to IMDB texts reviews dataset. CapsNet-Keras A Keras implemen

Lauro Moraes 5 Oct 23, 2022
Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

Deep Learning with TensorFlow 2 and Keras – Notebooks This project accompanies my Deep Learning with TensorFlow 2 and Keras trainings. It contains the

Aurélien Geron 1.9k Dec 15, 2022
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023
Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

MUSCO - Multimodal Descriptions of Social Concepts Automatic Modeling of (Highly Abstract) Social Concepts evoked by Art Images This project aims to i

0 Aug 22, 2021
POT : Python Optimal Transport

POT: Python Optimal Transport This open source Python library provide several solvers for optimization problems related to Optimal Transport for signa

Python Optimal Transport 1.7k Dec 31, 2022
Qlib is an AI-oriented quantitative investment platform

Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.

Microsoft 10.1k Dec 30, 2022
A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

AI-Biomed @NSCC-gz 3 May 08, 2022
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

Graph Neural Networks with Learnable Structural and Positional Representations Source code for the paper "Graph Neural Networks with Learnable Structu

Vijay Prakash Dwivedi 180 Dec 22, 2022
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
Implementation of "DeepOrder: Deep Learning for Test Case Prioritization in Continuous Integration Testing".

DeepOrder Implementation of DeepOrder for the paper "DeepOrder: Deep Learning for Test Case Prioritization in Continuous Integration Testing". Project

6 Nov 07, 2022
An efficient implementation of GPNN

Efficient-GPNN An efficient implementation of GPNN as depicted in "Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Mo

7 Apr 16, 2022
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c

203 Dec 30, 2022
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Dec 28, 2022
The `rtdl` library + The official implementation of the paper

The `rtdl` library + The official implementation of the paper "Revisiting Deep Learning Models for Tabular Data"

Yandex Research 510 Dec 30, 2022
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

Cristian Garcia 58 Nov 23, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022