Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Last update: Nov 20, 2022

Overview

Query Variation Generators

This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators".

Setup

Install the requirements using

pip install -r requirements.txt

Steps to reproduce the results

First we need to generate_weak supervsion for the desired test sets. We can do that with the scripts/generate_weak_supervision.py. In the paper we test for TREC-DL ('msmarco-passage/trec-dl-2019/judged') and ANTIQUE ('antique/train/split200-valid'), but any IR-datasets (https://ir-datasets.com/index.html) can be used here (as TASK).

python ${REPO_DIR}/examples/generate_weak_supervision.py 
    --task $TASK \
    --output_dir $OUT_DIR

This will generate one query variation for each method for the original queries. After this, we manually annotated the query variations generated, in order to keep only valid ones for analysis. For that we use analyze_weak_supervision.py (prepares data for manual anotation) and analyze_auto_query_generation_labeling.py (combines auto labels and anotations.).

However, for reproducing the results we can directly use the annotated query set to test neural ranking models robustness (RQ1):

python ${REPO_DIR}/disentangled_information_needs/evaluation/query_rewriting.py \
        --task 'irds:msmarco-passage/trec-dl-2019/judged' \
        --output_dir $OUT_DIR/ \
        --variations_file $OUT_DIR/$VARIATIONS_FILE_TREC_DL \
        --retrieval_model_name "BM25+KNRM" \
        --train_dataset "irds:msmarco-passage/train" \
        --max_iter $MAX_ITER

by using the annotated variations file directly here "$OUT_DIR/$VARIATIONS_FILE_TREC_DL". The same can be done to run rank fusion (RQ2) by replacing query_rewriting.py with rank_fusion.py.

The scripts evaluate_weak_supervision.sh and evaluate_rank_fusion.sh run all models and datasets for both research questions . The first generates the main table of results, Table 4 in the paper, and the second generates the tables for the rank fusion experiments (only available in the Arxiv version of the paper).

Modules and Folders

scripts: Contain most of the analysis scripts and also commands to run entire experiments.
examples: Contain an example on how to generate query variations.
disentangled_information_needs/evaluation: Scripts to evaluate robustness of models for query variations and also to evaluate rank fusion of query variations.
disentangled_information_needs/transformations: Methods to generate query variations.

Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Related tags

Overview

Query Variation Generators

Setup

Steps to reproduce the results

Modules and Folders

Owner

Gustavo Penha

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

dualPC.R contains the R code for the main functions.

Simple image captioning model - CLIP prefix captioning.

The CLRS Algorithmic Reasoning Benchmark

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

AirLoop: Lifelong Loop Closure Detection

Deep Learning Models for Causal Inference

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Advancing mathematics by guiding human intuition with AI

TraSw for FairMOT - A Single-Target Attack example (Attack ID: 19; Screener ID: 24):

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending"

Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models.

A collection of semantic image segmentation models implemented in TensorFlow

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Neural Turing Machines (NTM) - PyTorch Implementation

Model-based Reinforcement Learning Improves Autonomous Racing Performance

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I