Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Last update: Dec 04, 2022

Related tags

Overview

SMCG

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Introduction

We investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence. Formally, given a video and a syntactically valid exemplar sentence, the task aims to generate one caption which not only describes the semantic contents of the video, but also follows the syntactic form of the given exemplar sentence. In order to tackle such an exemplar-based video captioning task, we propose a novel Syntax Modulated Caption Generator (SMCG) incorporated in an encoder-decoder-reconstructor architecture.

Dependency

python 2.7.2
torch 1.1.0
java openjdk version "10.0.2" 2018-07-17
StanfordCoreNLP

Download Features and Preprocess Data

For the MSRVTT dataset, please download the following files into the './msrvtt/msrvtt_data/' folder:

MSRVTT caption info: videodatainfo_2016.json,
MSRVTT captions and their sentence parse trees: msrvtt_all_sentence_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: msrvtt_incepRes_rgb_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

For the ActivityNet Captionsd dataset, please download the following files into the './activitynet/activitynet_data/' folder:

ActivityNet caption info: CAP.pkl,
ActivityNet captions and their sentence parse trees: anet_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: anet_new_inception_resnet_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

Data Preprocessing

Go to the './msrvtt/process_msrvtt_data/' folder, and run:

python prepro_vocab_parse_pos.py
python fill_template.py

Go to the './activitynet/process_activitynet_data/' folder, and run:

python prepro_anetcoco_vocab_pos_parse.py
python fill_template.py

Model Training and Testing

For the MSRVTT dataset, please go to the './msrvtt/src/' folder, and train the model by:

python train.py --gpu xx

For model inference and evaluation, run:

bash eval.sh 
bash control.sh

Note: 'eval.sh' is used to evaluate the generated exemplar-based captions with conventional captioning metrics. 'control.sh' is used to compare the generated exemplar-based captions with the provided exemplar captions from the syntactic aspect, i.e., compute the edit distance between their parse trees.
For the ActivityNet Captions dataset, please go to the './activitynet/src/' folder, and train/test the model as on the MSRVTT dataset.

Citation

@inproceedings{yuan2020Control,
  title={Controllable Video Captioning with an Exemplar Sentence},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Zhu, Wenwu},
  booktitle={the 28th ACM International Conference on Multimedia (MM ’20)},
  year={2020}
}

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Related tags

Overview

SMCG

Introduction

Dependency

Download Features and Preprocess Data

Data Preprocessing

Model Training and Testing

Citation

Owner

Banglore House Prediction Using Flask Server (Python)

Code and experiments for "Deep Neural Networks for Rank Consistent Ordinal Regression based on Conditional Probabilities"

For visualizing the dair-v2x-i dataset

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

Intrusion Detection System using ensemble learning (machine learning)

Pytorch implementation for the paper: Contrastive Learning for Cold-start Recommendation

Flax is a neural network ecosystem for JAX that is designed for flexibility.

Python module providing a framework to trace individual edges in an image using Gaussian process regression.

DANet for Tabular data classification/ regression.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

GitHub repository for "Improving Video Generation for Multi-functional Applications"

LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

Filtering variational quantum algorithms for combinatorial optimization

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)