Unbiased Learning to Rank Algorithms (ULTRA)

This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiments and research on learning to rank with human annotated or noisy labels. With the unified data processing pipeline, ULTRA supports multiple unbiased learning-to-rank algorithms, online learning-to-rank algorithms, neural learning-to-rank models, as well as different methods to use and simulate noisy labels (e.g., clicks) to train and test different algorithms/ranking models. A user-friendly documentation can be found here.

Get Started

Create virtual environment (optional):

pip install --user virtualenv
~/.local/bin/virtualenv -p python3 ./venv
source venv/bin/activate

Install ULTRA from the source:

git clone https://github.com/ULTR-Community/ULTRA.git
cd ULTRA
make init # Replace 'tensorflow' with 'tensorflow-gpu' in requirements.txt for GPU support

Run toy example:

bash example/toy/offline_exp_pipeline.sh

Structure

Input Layers

ClickSimulationFeed: this is the input layer that generate synthetic clicks on fixed ranked lists to feed the learning algorithm.
DeterministicOnlineSimulationFeed: this is the input layer that first create ranked lists by sorting documents according to the current ranking model, and then generate synthetic clicks on the lists to feed the learning algorithm. It can do result interleaving if required by the learning algorithm.
StochasticOnlineSimulationFeed: this is the input layer that first create ranked lists by sampling documents based on their scores in the current ranking model and the Plackett-Luce distribution, and then generate synthetic clicks on the lists to feed the learning algorithm. It can do result interleaving if required by the learning algorithm.
DirectLabelFeed: this is the input layer that directly feed the true relevance labels of each documents to the learning algorithm.
[MTLSimulationFeed] (https://github.com/phyllist/ULTRA/blob/master/ultra/input_layer/mtl_simulation_feed.py): this is the input layer that generate synthetic click and dwell-time on fixed ranked lists to feed the learning algorithm.

Learning Algorithms

NA: this model is an implementation of the naive algorithm that directly train models with input labels (e.g., clicks).
DLA: this is an implementation of the Dual Learning Algorithm in Unbiased Learning to Rank with Unbiased Propensity Estimation.
IPW: this model is an implementation of the Inverse Propensity Weighting algorithms in Learning to Rank with Selection Bias in Personal Search and Unbiased Learning-to-Rank with Biased Feedback
REM: this model is an implementation of the regression-based EM algorithm in Position bias estimation for unbiased learning to rank in personal search
PD: this model is an implementation of the pairwise debiasing algorithm in Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm.
DBGD: this model is an implementation of the Dual Bandit Gradient Descent algorithm in Interactively optimizing information retrieval systems as a dueling bandits problem
MGD: this model is an implementation of the Multileave Gradient Descent in Multileave Gradient Descent for Fast Online Learning to Rank
NSGD: this model is an implementation of the Null Space Gradient Descent algorithm in Efficient Exploration of Gradient Space for Online Learning to Rank
PDGD: this model is an implementation of the Pairwise Differentiable Gradient Descent algorithm in Differentiable unbiased online learning to rank
PAIRREGM: this model is an implementation of the pairwise regression-based EM algorithm of our paper "Unbiased Pairwise Learning to Rank in Recommender Systems".

Ranking Models

Linear: this is a linear ranking algorithm that compute ranking scores with a linear function.
DNN: this is neural ranking algorithm that compute ranking scores with a multi-layer perceptron network (with non-linear activation functions).
DLCM: this is an implementation of the Deep Listwise Context Model in Learning a Deep Listwise Context Model for Ranking Refinement.
GSF: this is an implementation of the Groupwise Scoring Function in Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks.
SetRank: this is an implementation of the SetRank model in SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval.
[BiasTowerDNN] (https://github.com/phyllist/ULTRA/blob/master/ultra/ranking_model/BiasTowerDNN.py): this is an implementation of the shallow tower based DNN model

Supported Evaluation Metrics

MRR: the Mean Reciprocal Rank (inherited from TF-Ranking).
ERR: the Expected Reciprocal Rank from Expected reciprocal rank for graded relevance.
ARP: the Average Relevance Position (inherited from TF-Ranking).
NDCG: the Normalized Discounted Cumulative Gain (inherited from TF-Ranking).
DCG: the Discounted Cumulative Gain (inherited from TF-Ranking).
Precision: the Precision (inherited from TF-Ranking).
MAP: the Mean Average Precision (inherited from TF-Ranking).
Ordered_Pair_Accuracy: the percentage of correctedly ordered pair (inherited from TF-Ranking).

Click Simulation Example

Create click models for click simulations

python ultra/utils/click_models.py pbm 0.1 1 4 1.0 example/ClickModel

* The output is a json file containing the click mode that could be used for click simulation. More details could be found in the code.

(Optional) Estimate examination propensity with result randomization

python ultra/utils/propensity_estimator.py example/ClickModel/pbm_0.1_1.0_4_1.0.json 
   
     example/PropensityEstimator/

* The output is a json file containing the estimated examination propensity (used for IPW). DATA_DIR is the directory for the prepared data created by ./libsvm_tools/prepare_exp_data_with_svmrank.py. More details could be found in the code.

Citation

If you use ULTRA in your research, please use the following BibTex entry.

@article{10.1145/3439861,
author = {Ai, Qingyao and Yang, Tao and Wang, Huazheng and Mao, Jiaxin},
title = {Unbiased Learning to Rank: Online or Offline?},
year = {2021},
issue_date = {February 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {39},
number = {2},
issn = {1046-8188},
url = {https://doi.org/10.1145/3439861},
doi = {10.1145/3439861},
journal = {ACM Trans. Inf. Syst.},
month = feb,
articleno = {21},
numpages = {29},
keywords = {unbiased learning, online learning, Learning to rank}
}

@inproceedings{Ai:2018:ULR:3269206.3274274,
 author = {Ai, Qingyao and Mao, Jiaxin and Liu, Yiqun and Croft, W. Bruce},
 title = {Unbiased Learning to Rank: Theory and Practice},
 booktitle = {Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
 series = {CIKM '18},
 year = {2018},
 isbn = {978-1-4503-6014-2},
 location = {Torino, Italy},
 pages = {2305--2306},
 numpages = {2},
 url = {http://doi.acm.org/10.1145/3269206.3274274},
 doi = {10.1145/3269206.3274274},
 acmid = {3274274},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {click model, counterfactual learning, unbiased learning to rank, user bias},
}

Development Team

Qingyao Ai

Core Dev
ASST PROF, Univ. of Utah

Tao Yang

Core Dev
Ph.D., Univ. of Utah

Huazheng Wang

Core Dev
Ph.D., Univ. of Virginia

Jiaxin Mao

Core Dev
Postdoc, Tsinghua Univ.

Contribution

Please read the Contributing Guide before creating a pull request.

Project Organizers

Qingyao Ai
- School of Computing, University of Utah
- Homepage

License

Apache-2.0

An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Related tags

Overview

Unbiased Learning to Rank Algorithms (ULTRA)

Get Started

Structure

Input Layers

Learning Algorithms

Ranking Models

Supported Evaluation Metrics

Click Simulation Example

Citation

Development Team

Contribution

Project Organizers

License

Owner

back

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

Self-Supervised Deep Blind Video Super-Resolution

[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

A Python Package for Convex Regression and Frontier Estimation

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Training vision models with full-batch gradient descent and regularization

Tool for working with Y-chromosome data from YFull and FTDNA

Clustering is a popular approach to detect patterns in unlabeled data

Classic Papers for Beginners and Impact Scope for Authors.

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

A privacy-focused, intelligent security camera system.

Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

The repository for freeCodeCamp's YouTube course, Algorithmic Trading in Python

Classifies galaxy morphology with Bayesian CNN

A small fun project using python OpenCV, mediapipe, and pydirectinput