Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Last update: Nov 25, 2022

Related tags

Deep Learning SGN

Overview

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv]

Environment

Ubuntu 16.04
CUDA 9.2
cuDNN 7.4.2
Java 8
Python 2.7.12
- PyTorch 1.1.0
- Other python packages specified in requirements.txt

Usage

1. Setup

$ pip install -r requirements.txt

2. Prepare Data

Download the GloVe Embedding from here and locate it at data/Embeddings/GloVe/GloVe_300.json.
Extract features from datasets and locate them at data/ /features/ .hdf5.

e.g. ResNet101 features of the MSVD dataset will be located at data/MSVD/features/ResNet101.hdf5.

I refer to this repo for extracting the ResNet101 features, and this repo for extracting the 3D-ResNext101 features.
Split the features into train, val, and test sets by running following commands.
```
$ python -m split.MSVD
$ python -m split.MSR-VTT
```

You can skip step 2-3 and download below files

MSVD
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]
MSR-VTT
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]

3. Prepare The Code for Evaluation

Clone the evaluation code from the official coco-evaluation repo.

$ git clone https://github.com/tylin/coco-caption.git
$ mv coco-caption/pycocoevalcap .
$ rm -rf coco-caption

4. Extract Negative Videos

$ python extract_negative_videos.py

or you can skip this step as the output files are already uploaded at data/ /metadata/neg_vids_ .json

5. Train

$ python train.py

You can change some hyperparameters by modifying config.py.

Pretrained Models - SGN(R101+RN)

*Disclaimer: The models above do not have the same weight as the models used in the paper (I trained them again because I lost).

6. Evaluate

$ python evaluate.py --ckpt_fpath

License

The source-code in this repository is released under MIT License.

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Related tags

Overview

Semantic Grouping Network for Video Captioning

Environment

Usage

1. Setup

2. Prepare Data

3. Prepare The Code for Evaluation

4. Extract Negative Videos

5. Train

6. Evaluate

License

Owner

Hobin Ryu

Simulate genealogical trees and genomic sequence data using population genetic models

Differentiable Abundance Matching With Python

scikit-learn inspired API for CRFsuite

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Reaction SMILES-AA mapping via language modelling

Faster Convex Lipschitz Regression

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

2.86% and 15.85% on CIFAR-10 and CIFAR-100

Incomplete easy-to-use math solver and PDF generator.

A full-fledged version of Pix2Seq

EmoTag helps you train emotion detection model for Chinese audios

Genpass - A Passwors Generator App With Python3

for a paper about leveraging discourse markers for training new models

pyspark🍒🥭 is delicious，just eat it!😋😋

Change Detection in SAR Images Based on Multiscale Capsule Network

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy