Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Overview

UNITER: UNiversal Image-TExt Representation Learning

This is the official repository of UNITER (ECCV 2020). This repository currently supports finetuning UNITER on NLVR2, VQA, VCR, SNLI-VE, Image-Text Retrieval for COCO and Flickr30k, and Referring Expression Comprehensions (RefCOCO, RefCOCO+, and RefCOCO-g). Both UNITER-base and UNITER-large pre-trained checkpoints are released. UNITER-base pre-training with in-domain data is also available.

Overview of UNITER

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Quick Start

NOTE: Please run bash scripts/download_pretrained.sh $PATH_TO_STORAGE to get our latest pretrained checkpoints. This will download both the base and large models.

We use NLVR2 as an end-to-end example for using this code base.

  1. Download processed data and pretrained models with the following command.

    bash scripts/download_nlvr2.sh $PATH_TO_STORAGE

    After downloading you should see the following folder structure:

    ├── ann
    │   ├── dev.json
    │   └── test1.json
    ├── finetune
    │   ├── nlvr-base
    │   └── nlvr-base.tar
    ├── img_db
    │   ├── nlvr2_dev
    │   ├── nlvr2_dev.tar
    │   ├── nlvr2_test
    │   ├── nlvr2_test.tar
    │   ├── nlvr2_train
    │   └── nlvr2_train.tar
    ├── pretrained
    │   └── uniter-base.pt
    └── txt_db
        ├── nlvr2_dev.db
        ├── nlvr2_dev.db.tar
        ├── nlvr2_test1.db
        ├── nlvr2_test1.db.tar
        ├── nlvr2_train.db
        └── nlvr2_train.db.tar
    
  2. Launch the Docker container for running the experiments.

    # docker image should be automatically pulled
    source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
        $PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

    The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

  3. Run finetuning for the NLVR2 task.

    # inside the container
    python train_nlvr2.py --config config/train-nlvr2-base-1gpu.json
    
    # for more customization
    horovodrun -np $N_GPU python train_nlvr2.py --config $YOUR_CONFIG_JSON
  4. Run inference for the NLVR2 task and then evaluate.

    # inference
    python inf_nlvr2.py --txt_db /txt/nlvr2_test1.db/ --img_db /img/nlvr2_test/ \
        --train_dir /storage/nlvr-base/ --ckpt 6500 --output_dir . --fp16
    
    # evaluation
    # run this command outside docker (tested with python 3.6)
    # or copy the annotation json into mounted folder
    python scripts/eval_nlvr2.py ./results.csv $PATH_TO_STORAGE/ann/test1.json

    The above command runs inference on the model we trained. Feel free to replace --train_dir and --ckpt with your own model trained in step 3. Currently we only support single GPU inference.

  5. Customization

    # training options
    python train_nlvr2.py --help
    • command-line argument overwrites JSON config files
    • JSON config overwrites argparse default value.
    • use horovodrun to run multi-GPU training
    • --gradient_accumulation_steps emulates multi-gpu training
  6. Misc.

    # text annotation preprocessing
    bash scripts/create_txtdb.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/ann
    
    # image feature extraction (Tested on Titan-Xp; may not run on latest GPUs)
    bash scripts/extract_imgfeat.sh $PATH_TO_IMG_FOLDER $PATH_TO_IMG_NPY
    
    # image preprocessing
    bash scripts/create_imgdb.sh $PATH_TO_IMG_NPY $PATH_TO_STORAGE/img_db

    In case you would like to reproduce the whole preprocessing pipeline.

Downstream Tasks Finetuning

VQA

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_vqa.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_vqa.py --config config/train-vqa-base-4gpu.json \
        --output_dir $VQA_EXP
    
  3. inference
    python inf_vqa.py --txt_db /txt/vqa_test.db --img_db /img/coco_test2015 \
        --output_dir $VQA_EXP --checkpoint 6000 --pin_mem --fp16
    
    The result file will be written at $VQA_EXP/results_test/results_6000_all.json, which can be submitted to the evaluation server

VCR

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_vcr.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_vcr.py --config config/train-vcr-base-4gpu.json \
        --output_dir $VCR_EXP
    
  3. inference
    horovodrun -np 4 python inf_vcr.py --txt_db /txt/vcr_test.db \
        --img_db "/img/vcr_gt_test/;/img/vcr_test/" \
        --split test --output_dir $VCR_EXP --checkpoint 8000 \
        --pin_mem --fp16
    
    The result file will be written at $VCR_EXP/results_test/results_8000_all.csv, which can be submitted to VCR leaderboard for evluation.

VCR 2nd Stage Pre-training

NOTE: pretrain should be ran inside the docker container

  1. download VCR data if you haven't
    bash scripts/download_vcr.sh $PATH_TO_STORAGE
    
  2. 2nd stage pre-train
    horovodrun -np 4 python pretrain_vcr.py --config config/pretrain-vcr-base-4gpu.json \
        --output_dir $PRETRAIN_VCR_EXP
    

Visual Entailment (SNLI-VE)

NOTE: train should be ran inside the docker container

  1. download data
    bash scripts/download_ve.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 2 python train_ve.py --config config/train-ve-base-2gpu.json \
        --output_dir $VE_EXP
    

Image-Text Retrieval

download data

bash scripts/download_itm.sh $PATH_TO_STORAGE

NOTE: Image-Text Retrieval is computationally heavy, especially on COCO.

Zero-shot Image-Text Retrieval (Flickr30k)

# every image-text pair has to be ranked; please use as many GPUs as possible
horovodrun -np $NGPU python inf_itm.py \
    --txt_db /txt/itm_flickr30k_test.db --img_db /img/flickr30k \
    --checkpoint /pretrain/uniter-base.pt --model_config /src/config/uniter-base.json \
    --output_dir $ZS_ITM_RESULT --fp16 --pin_mem

Image-Text Retrieval (Flickr30k)

  • normal finetune
    horovodrun -np 8 python train_itm.py --config config/train-itm-flickr-base-8gpu.json
    
  • finetune with hard negatives
    horovodrun -np 16 python train_itm_hard_negatives.py \
        --config config/train-itm-flickr-base-16gpu-hn.jgon
    

Image-Text Retrieval (COCO)

  • finetune with hard negatives
    horovodrun -np 16 python train_itm_hard_negatives.py \
        --config config/train-itm-coco-base-16gpu-hn.json
    

Referring Expressions

  1. download data
    bash scripts/download_re.sh $PATH_TO_STORAGE
    
  2. train
    python train_re.py --config config/train-refcoco-base-1gpu.json \
        --output_dir $RE_EXP
    
  3. inference and evaluation
    source scripts/eval_refcoco.sh $RE_EXP
    
    The result files will be written under $RE_EXP/results_test/

Similarly, change corresponding configs/scripts for running RefCOCO+/RefCOCOg.

Pre-tranining

download

bash scripts/download_indomain.sh $PATH_TO_STORAGE

pre-train

horovodrun -np 8 python pretrain.py --config config/pretrain-indomain-base-8gpu.json \
    --output_dir $PRETRAIN_EXP

Unfortunately, we cannot host CC/SBU features due to their large size. Users will need to process them on their own. We will provide a smaller sample for easier reference to the expected format soon.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{chen2020uniter,
  title={Uniter: Universal image-text representation learning},
  author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
  booktitle={ECCV},
  year={2020}
}

License

MIT

Owner
Yen-Chun Chen
Researcher @ Microsoft Cloud+AI. previously Machine Learning Scientist @ Stackline; M.S. student @ UNC Chapel Hill NLP group
Yen-Chun Chen
基于“Seq2Seq+前缀树”的知识图谱问答

KgCLUE-bert4keras 基于“Seq2Seq+前缀树”的知识图谱问答 简介 博客:https://kexue.fm/archives/8802 环境 软件:bert4keras=0.10.8 硬件:目前的结果是用一张Titan RTX(24G)跑出来的。 运行 第一次运行的时候,会给知

苏剑林(Jianlin Su) 65 Dec 12, 2022
A combination of autoregressors and autoencoders using XLNet for sentiment analysis

A combination of autoregressors and autoencoders using XLNet for sentiment analysis Abstract In this paper sentiment analysis has been performed in or

James Zaridis 2 Nov 20, 2021
The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Karibash 3 Aug 19, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

epub2audiobook Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech Input examples qual a pasta do seu

7 Aug 25, 2022
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+-- 3D_structure -- 3D_module --+ +-- ? | |

Eric Alcaide 18 Jan 11, 2022
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022
[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

◥ Curriculum Labeling ◣ Revisiting Pseudo-Labeling for Semi-Supervised Learning Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez. In the

UVA Computer Vision 113 Dec 15, 2022
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

NAME inflect.py - Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words. SYNOPSIS import inflect p = in

Jason R. Coombs 762 Dec 29, 2022
AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

Microsoft 37 Nov 29, 2022
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

Akil Selvan Rajendra Janarthanan 3 Mar 02, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 02, 2023