Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior sampling)

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

Exploration-Exploitation Dilemma Solving Methods

Related tags

Overview

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Owner

Aman Mishra

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"

Simulations for Turring patterns on an apically expanding domain. T

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Fermi Problems: A New Reasoning Challenge for AI

[ICCV 2021] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

The codes and models in 'Gaze Estimation using Transformer'.

Bringing sanity to world of messed-up data

my graduation project is about live human face augmentation by projection mapping by using CNN

Pytorch Lightning Distributed Accelerators using Ray

Time Delayed NN implemented in pytorch

Geometry-Free View Synthesis: Transformers and no 3D Priors

Open source Python implementation of the HDR+ photography pipeline

My coursework for Machine Learning (2021 Spring) at National Taiwan University (NTU)

使用yolov5训练自己数据集(详细过程)并通过flask部署

Image Segmentation Animation using Quadtree concepts.

This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability.

Fully Convolutional DenseNets for semantic segmentation.

Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

Fuzzy Overclustering (FOC)