Bert Axioms

This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics

Required Data

In order to run this code, you first need to download the dataset from the TREC 2019 Deep Learning Track Guidelines. The path for these should be specified in the config file

You also need a working installation of the Indri Toolkit for indexing and retrieval.

Parameters

There are a number of hyperparemeter that need to be set (like indri path, number of candidates to be retrieved, random seed etc). These can be set on a config YAML file at scripts/config-defaults.yaml. The parameters are handled by wandb, but can easily be addapted for any YAML reader (take a look at PyYAML.)

Observations

Note that, for LNC2, we use an external C++ code for dealing with Indri. This is so we can add the duplicated documents to the index without comprimissing scores. This code should be compiled with Indri's Makefile.app. This should be as easy as edditing Makefile.app from Indri and running make -f Makefile.app. (Check https://lemur.sourceforge.io/indri/ for more details).

The removal process of documents from the indri index does not guarantee that the index statistics will change immediately. This can cause slight differences than the more "correct" way to re-create the index from scratch for every duplicated document.

Expected Results

The results from this repository may not directly replicate the ones that appear on the paper. This is due to a few performance improvements made after the paper submission. These, however, do not change the final scores and conclusions. Mostly, you may see a increase on alpha-nDCG for all methods, and a increase on QL performance accross the board.

	`nDCG_cut`	`TFCI`	`TFCII`	`MTDC`	`LNC1`	`LNC2`	`TP`	`STMC1`	`STMC2`	`STMC3`
QL	0.3633	0.9936	0.7008	0.8759	0.5021	1.000	0.3852	0.4855	0.7047	0.7011
DistilBERT	0.4537	0.6109	0.3945	0.5130	0.5006	0.0003	0.4105	0.5040	0.5120	0.5099

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Related tags

Overview

Bert Axioms

Required Data

Parameters

Observations

Expected Results

Owner

Arthur Câmara

FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

Code for "Learning Graph Cellular Automata"

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Tensorflow implementation for Self-supervised Graph Learning for Recommendation

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

Deep learned, hardware-accelerated 3D object pose estimation

A faster pytorch implementation of faster r-cnn

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

Black box hyperparameter optimization made easy.

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Planning from Pixels in Environments with Combinatorially Hard Search Spaces -- NeurIPS 2021

This is the dataset and code release of the OpenRooms Dataset.