batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Official DGL implementation of "Rethinking High-order Graph Convolutional Networks"

This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

BboxToolkit is a tiny library of special bounding boxes.

Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Tensorflow 2 implementation of our high quality frame interpolation neural network

Perform Linear Classification with Multi-way Data

DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

Public Code for NIPS submission SimiGrad: Fine-Grained Adaptive Batching for Large ScaleTraining using Gradient Similarity Measurement

Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

Omnidirectional camera calibration in python

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

This program will stylize your photos with fast neural style transfer.

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Cross-platform CLI tool to generate your Github profile's stats and summary.

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

PyTorch Implementation of CycleGAN and SSGAN for Domain Transfer (Minimal)

Direct design of biquad filter cascades with deep learning by sampling random polynomials.