The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Last update: Dec 22, 2022

Overview

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Flick30k	SAF	75.6	92.7	96.9	56.5	82.0	88.4
	SGR	76.6	93.7	96.6	56.1	80.9	87.0
	SGRAF	78.4	94.6	97.5	58.2	83.0	89.1
MSCOCO1k	SAF	78.0	95.9	98.5	62.2	89.5	95.4
	SGR	77.3	96.0	98.6	62.1	89.6	95.3
	SGRAF	79.2	96.5	98.6	63.5	90.2	95.8
MSCOCO5k	SAF	55.5	83.8	91.8	40.1	69.7	80.4
	SGR	57.3	83.2	90.6	40.5	69.6	80.3
	SGRAF	58.8	84.8	92.1	41.6	70.9	81.5

Requirements

We recommended the following dependencies for Branch main.

Python 2.7
PyTorch (>=0.4.1)
NumPy (>=1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip

Pre-trained models and evaluation

Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

@inproceedings{Diao2021SGRAF,
  title={Similarity Reasoning and Filtration for Image-Text Matching},
  author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
  booktitle={AAAI},
  year={2021}
}

License

Apache License 2.0.
If any problems, please contact me at ([email protected]) or ([email protected]).

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Related tags

Overview

SGRAF

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

Owner

Ronnie_IIAU

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models (published in ICLR2018)

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

Virtual hand gesture mouse using a webcam

Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

StarGAN-ZSVC: Unofficial PyTorch Implementation

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

A library for uncertainty representation and training in neural networks.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

v objective diffusion inference code for PyTorch.

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.