LaBERT - A length-controllable and non-autoregressive image captioning model.

Last update: Nov 13, 2022

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

Prepare MSCOCO data follow link.
Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
- It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}

LaBERT - A length-controllable and non-autoregressive image captioning model.

Related tags

Overview

Length-Controllable Image Captioning (ECCV2020)

Install

Data & Pre-trained Models

Scripts

Cite

Owner

bearcatt

GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Distributed Arcface Training in Pytorch

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Post-training Quantization for Neural Networks with Provable Guarantees

HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Amazing-Python-Scripts - 🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

Codes and Data Processing Files for our paper.

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

Continuous Diffusion Graph Neural Network

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

A Framework for Encrypted Machine Learning in TensorFlow

Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

Fuwa-http - The http client implementation for the fuwa eco-system

FSL-Mate: A collection of resources for few-shot learning (FSL).

UIUCTF 2021 Public Challenge Repository

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing