Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evaluation (NeurIPS 2021 Workshop on OfflineRL).

The code is written in python 3, using Pytorch for the implementation of the deep networks and OpenAI gym for the experiment domains.

Requirements

To install the required codebase, it is recommended to create a conda or a virtual environment. Then, run the following command

pip install -r requirements.txt

Preparation

To conduct policy evaluation, we need to prepare a set of pretrained policies. You can skip this part if you already have the pretrained models in policy_models/ and the corresponding policy values in experiments/policy_info.py

Pretrained Policy

Train the policy models using REINFORCE in different domains by running:

python policy/reinfoce.py --exp_name {exp_name}

where {exp_name} can be MultiBandit, GridWorld, CartPole or CartPoleContinuous. The parameterized epsilon-greedy policies for MultiBandit and GridWorld can be obtained by running:

python policy/handmade_policy.py

Policy Value

Option 1: Run in sequence

For each policy model, the true policy value is estimated with $10^6$ Monte Carlo roll-outs by running:

python experiments/policy_value.py --policy_name {policy_name} --seed {seed} --n 10e6

This will print the average steps, true policy value and variance of returns. Make sure you copy these results into the file experiment/policy_info.py.

Option 2: Run in parallel

If you can use qsub or sbatch, you can also run jobs/jobs_value.py with different seeds in parallel and merge them by running experiments/merge_values.py to get $10^6$ Monte Carlo roll-outs. The policy values reported in this paper were obtained in this way.

Evaluation

Option 1: Run in sequence

The main running script for policy evaluation is experiments/evaluate.py. The following running command is an example of Monte Carlo estimation for Robust On-policy Acting with $\rho=1.0$ for the policy model_GridWorld_5000.pt with seeds from 0 to 199.

python experiments/evaluate.py --policy_name GridWorld_5000 --ros_epsilon 1.0 --collectors RobustOnPolicyActing --estimators MonteCarlo --eval_steps "7,14,29,59,118,237,475,951,1902,3805,7610,15221,30443,60886" --seeds "0,199"

To conduct policy evaluation with off-policy data, you need to add the following arguments to the above running command:

--combined_trajectories 100 --combined_ops_epsilon 0.10 

Option 2: Run in parallel

If you can use qsub or sbatch, you may only need to run the script jobs/jobs.py where all experiments in the paper are arranged. The log will be saved in log/ and the seed results will be saved in results/seeds. Note that we save the data collection cache in results/data and re-use it for different value estimations. To merge results of different seeds, run experiments/merge_results.py, and the merged results will be saved in results/.

Ploting

When the experiments are finished, all the figures in the paper are produced by running

python drawing/draw.py

Citing

If you use this repository in your work, please consider citing the paper

@inproceedings{zhong2021robust,
    title = {Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
    author = {Rujie Zhong, Josiah P. Hanna, Lukas Schäfer and Stefano V. Albrecht},
    booktitle = {NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
    year = {2021}
}
Owner
Autonomous Agents Research Group (University of Edinburgh)
Official code repositories for projects by the Autonomous Agents Research Group
Autonomous Agents Research Group (University of Edinburgh)
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 07, 2022
Naszilla is a Python library for neural architecture search (NAS)

A repository to compare many popular NAS algorithms seamlessly across three popular benchmarks (NASBench 101, 201, and 301). You can implement your ow

270 Jan 03, 2023
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambig

王皓波 147 Jan 07, 2023
The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

BiMix The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation arxiv Framework: visualization results: Requiremen

stanley 18 Sep 18, 2022
Voice assistant - Voice assistant with python

🌐 Python Voice Assistant 🌵 - User's greeting 🌵 - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

The-Emergence-of-Objectness This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

44 Oct 08, 2022
UMich 500-Level Mobile Robotics Course

MOBILE ROBOTICS: METHODS & ALGORITHMS - WINTER 2022 University of Michigan - NA 568/EECS 568/ROB 530 For slides, lecture notes, and example codes, see

393 Dec 29, 2022
Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

LEXA Benchmark Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper (Discovering and Achieving Goals via World Models

Oleg Rybkin 36 Dec 22, 2022
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 01, 2023
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

Change is Everywhere Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery by Zhuo Zheng, Ailong Ma, Liangpei Zhang and Yanfei

Zhuo Zheng 125 Dec 13, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
Official PyTorch Implementation of Learning Architectures for Binary Networks

Learning Architectures for Binary Networks An Pytorch Implementation of the paper Learning Architectures for Binary Networks (BNAS) (ECCV 2020) If you

Computer Vision Lab. @ GIST 25 Jun 09, 2022
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

52 Nov 21, 2022
Python script that allows you to automatically setup your Growtopia server.

AutoSetup Python script that allows you to automatically setup your Growtopia server. How To Use Firstly, install all the required modules that used i

Aspire 3 Mar 06, 2022
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

Aryclenio Xavier Barros 26 Oct 14, 2022