Bi-directional Image and Text Generation

UMT-BITG (image & text generator)

Unifying Multimodal Transformer for Bi-directional Image and Text Generation,
Yupan Huang, Bei Liu, Yutong Lu, in ACM MM 2021 (Industrial Track).

UMT-DBITG (diverse image & text generator)

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation,
Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu, in ACM MM 2021 (Video and Demo Track).

Poster or slides are available in the assets folder by visiting OneDrive.

Data & Pre-trained Models

Download preprocessed data and our pre-trained models by visiting OneDrive. We suggest following our data structures, which is consistent with the paths in config.py. You may need to modify the root_path in config.py. In addition, please following the instructions to prepare some other data:

Download grid features in path data/grid_features provided by X-LXMERT or follow feature extraction to extract these features.

wget https://ai2-vision-x-lxmert.s3-us-west-2.amazonaws.com/butd_features/COCO/maskrcnn_train_grid8.h5 -P data/grid_features
wget https://ai2-vision-x-lxmert.s3-us-west-2.amazonaws.com/butd_features/COCO/maskrcnn_valid_grid8.h5 -P data/grid_features
wget https://ai2-vision-x-lxmert.s3-us-west-2.amazonaws.com/butd_features/COCO/maskrcnn_test_grid8.h5 -P data/grid_features

For text-to-image evaluation on MSCOCO dataset, we need the real images to calculate the FID metric. For UMT-DBITG, we use MSCOCO karpathy split, which has been included in the OneDrive folder (images/imgs_karpathy). For UMT-BITG, please download MSCOCO validation set in path images/coco_val2014.

Citation

If you like our paper or code, please generously cite us:

@inproceedings{huang2021unifying,
  author    = {Yupan Huang and Bei Liu and Yutong Lu},
  title     = {Unifying Multimodal Transformer for Bi-directional Image and Text Generation},
  booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
  year      = {2021}
}

@inproceedings{huang2021diverse,
  author    = {Yupan Huang and Bei Liu and Jianlong Fu and Yutong Lu},
  title     = {A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation},
  booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
  year      = {2021}
}

Acknowledgement

Our code is based on LaBERT and X-LXMERT. Our evaluation code is from pytorch-fid and inception_score. We sincerely thank them for their contributions!

Feel free to open issues or email to me for help to use this code. Any feedback is welcome!

A collection of models for image - text generation in ACM MM 2021.

Related tags

Overview

Bi-directional Image and Text Generation

UMT-BITG (image & text generator)

UMT-DBITG (diverse image & text generator)

Data & Pre-trained Models

Citation

Acknowledgement

Owner

Multimedia Research

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Understand Text Summarization and create your own summarizer in python

The training code for the 4th place model at MDX 2021 leaderboard A.

Pipeline for chemical image-to-text competition

GPT-2 Model for Leetcode Questions in python

An Explainable Leaderboard for NLP

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

Long text token classification using LongFormer

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Module for automatic summarization of text documents and HTML pages.

A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

PyTorch impelementations of BERT-based Spelling Error Correction Models.

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

SpikeX - SpaCy Pipes for Knowledge Extraction