PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Last update: Dec 05, 2022

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Code for the image-caption retrieval methods from VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , F. Faghri, D. J. Fleet, J. R. Kiros, S. Fidler, Proceedings of the British Machine Vision Conference (BMVC), 2018. (BMVC Spotlight)

Dependencies

We recommended to use Anaconda for the following packages.

Python 2.7 (Checkout branch python3)
PyTorch (>0.2) (Checkout branch pytorch4.1)
NumPy (>1.12.1)
TensorBoard
pycocotools
torchvision
matplotlib
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

Download the dataset files and pre-trained models. We use splits produced by Andrej Karpathy. The precomputed image features are from here and here. To use full image encoders, download the images from their original sources here, here and here.

wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/runs.tar

We refer to the path of extracted files for data.tar as $DATA_PATH and files for models.tar as $RUN_PATH. Extract vocab.tar to ./vocab directory.

Update: The vocabulary was originally built using all sets (including test set captions). Please see issue #29 for details. Please consider not using test set captions if building up on this project.

Evaluate pre-trained models

python -c "\
from vocab import Vocabulary
import evaluation
evaluation.evalrank('$RUN_PATH/coco_vse++/model_best.pth.tar', data_path='$DATA_PATH', split='test')"

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco.

Training new models

Run train.py:

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --logger_name 
runs/coco_vse++ --max_violation

Arguments used to train pre-trained models:

Method	Arguments
VSE0	`--no_imgnorm`
VSE++	`--max_violation`
Order0	`--measure order --use_abs --margin .05 --learning_rate .001`
Order++	`--measure order --max_violation`

Reference

If you found this code useful, please cite the following paper:

@article{faghri2018vse++,
  title={VSE++: Improving Visual-Semantic Embeddings with Hard Negatives},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  booktitle = {Proceedings of the British Machine Vision Conference ({BMVC})},
  url = {https://github.com/fartashf/vsepp},
  year={2018}
}

License

Apache License 2.0

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Related tags

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Dependencies

Download data

Evaluate pre-trained models

Training new models

Reference

License

Owner

Fartash Faghri

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

It is the assignment for COMP 576 in Rice University

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

ElasticFace: Elastic Margin Loss for Deep Face Recognition

Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

This code is an implementation for Singing TTS.

PyTorch Personal Trainer: My framework for deep learning experiments

Official implementation for Scale-Aware Neural Architecture Search for Multivariate Time Series Forecasting

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

GPU-Accelerated Deep Learning Library in Python

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

A setup script to generate ITK Python Wheels

An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

Deeper DCGAN with AE stabilization

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.