Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

Copy Paste positive polyp using poisson image blending for medical image segmentation

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

A set of tools for Namebase and HNS

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Code for "Offline Meta-Reinforcement Learning with Advantage Weighting" [ICML 2021]

DeconvNet : Learning Deconvolution Network for Semantic Segmentation

Adds timm pretrained backbone to pytorch's FasterRcnn model

CountDown to New Year and shoot fireworks

An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

AI grand challenge 2020 Repo (Speech Recognition Track)

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty