Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Last update: Nov 07, 2022

Related tags

Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	66.6	88.6	94.0	51.6	79.1	87.5
NSGDC-Large	67.8	89.6	94.2	53.3	80.0	88.0

Flickr30K

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	87.9	98.1	99.3	74.5	93.3	96.3
NSGDC-Large	90.6	98.8	99.1	77.3	94.3	97.3

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Related tags

Overview

NSGDC

Requirements

Image-Text Retrieval

Download Data

Launch the Docker Container

Image-Text Retrieval (Flickr30k)

Image-Text Retrieval (COCO)

Run Inference

Results

MS-COCO

Flickr30K

Owner

Zhihao Fan

Where2Act: From Pixels to Actions for Articulated 3D Objects

This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

PyTorch implementation for "HyperSPNs: Compact and Expressive Probabilistic Circuits", NeurIPS 2021

Cognate Detection Repository

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Pathdreamer: A World Model for Indoor Navigation

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Randomizes the warps in a stock pokeemerald repo.

Benchmarks for Object Detection in Aerial Images

Generate pixel-style avatars with python.

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Implementation of the pix2pix model on satellite images

Development of IP code based on VIPs and AADM

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.