This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Last update: Jan 08, 2023

Related tags

Deep Learning contriever

Overview

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning.

We use a simple contrastive learning framework to pre-train models for information retrieval. Contriever, trained without supervision, is competitive with BM25 for [email protected] on the BEIR benchmark. After finetuning on MSMARCO, Contriever obtains strong performance, especially for the recall at 100.

Getting Started

Pre-trained models can be loaded through the HuggingFace transformers library:

import transformers
from src.contriever import Contriever

model = Contriever.from_pretrained("facebook/contriever")
tokenizer = transformers.BertTokenizerFast.from_pretrained("facebook/contriever")

Embeddings for different sentences can be obtained by doing the following:

sentences = [
    "Where was Marie Curie born?",
    "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
    "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
]

inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
embeddings = model(**inputs)

Then similarity scores between the different sentences can be obtained with a dot product between the embeddings:

score01 = embddings[0] @ embeddings[1] #1.0473
score02 = embddings[0] @ embeddings[2] #1.0095

BEIR evaluation

Scores on the BEIR benchmark can be reproduced using eval_beir.py.

python eval_beir.py --model_name_or_path facebook/contriever-msmarco --dataset scifact

Available models

Model	Description
facebook/contriever	Model pre-trained on Wikipedia and CC-net without any supervised data
facebook/contriever-msmarco	Pre-trained model fine-tuned on MS-MARCO

References

[1] G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, E. Grave Towards Unsupervised Dense Information Retrieval with Contrastive Learning

@misc{izacard2021contriever,
      title={Towards Unsupervised Dense Information Retrieval with Contrastive Learning}, 
      author={Gautier Izacard and Mathilde Caron and Lucas Hosseini and Sebastian Riedel and Piotr Bojanowski and Armand Joulin and Edouard Grave},
      year={2021},
      eprint={2112.09118},
      archivePrefix={arXiv},
}

License

See the LICENSE file for more details.

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Related tags

Overview

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Getting Started

BEIR evaluation

Available models

References

License

Owner

Meta Research

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

这是一个mobilenet-yolov4-lite的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。

TorchXRayVision: A library of chest X-ray datasets and models.

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Dark Finix: All in one hacking framework with almost 100 tools

Transfer Reinforcement Learning for Differing Action Spaces via Q-Network Representations

Algorithm to texture 3D reconstructions from multi-view stereo images

Analyses of the individual electric field magnitudes with Roast.

SemiNAS: Semi-Supervised Neural Architecture Search

Exe-to-xlsm - Simple script to create VBscript of exe and inject to xlsm

yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

Benchmarks for semi-supervised domain generalization.

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

A script depending on VASP output for calculating Fermi-Softness.

I will implement Fastai in each projects present in this repository.

Official PyTorch Implementation of Learning Architectures for Binary Networks

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch