Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Last update: Aug 27, 2022

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

Owner

Lun Huang

A Survey on Deep Learning Technique for Video Segmentation

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Reinforcement Learning for the Blackjack

UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

A toolkit for controlling Euro Truck Simulator 2 with python to develop self-driving algorithms.

An efficient implementation of GPNN

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Simple reference implementation of GraphSAGE.

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

A ssl analyzer which could analyzer target domain's certificate.

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Deep Learning for Time Series Classification

LETR: Line Segment Detection Using Transformers without Edges

Application of K-means algorithm on a music dataset after a dimensionality reduction with PCA

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

AI grand challenge 2020 Repo (Speech Recognition Track)

Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)