Simple (but Strong) Baselines for POMDPs

Last update: Dec 29, 2022

Overview

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specifically the recurrent model-free RL, for the following paper

Paper: arXiv Numeric Results: google drive

by Tianwei Ni, Benjamin Eysenbach and Ruslan Salakhutdinov.

Installation

First download this repo into your local directory (preferably on a cluster or a server) <local_path>. Then we recommend to use a virtual env to install all the dependencies. For example, we install using miniconda:

conda env create -f install.yml
conda activate pomdp

The yaml file includes all the dependencies (e.g. PyTorch, PyBullet) used in our experiments (including compared methods), but there are two exceptions:

To run Cheetah-Vel in meta RL, you have to install MuJoCo with a license
To run robust RL and generalization in RL experiments, you have to install roboschool.
- We found it hard to install roboschool from scratch, therefore we provide a docker file roboschool.sif in google drive that contains roboschool and the other necessary libraries, adapted from SunBlaze repo.
- To download and activate the docker file by singularity on a cluster (on a single server should be similar):
```
# download roboschool.sif from the google drive to envs/rl-generalization/roboschool.sif
# then run singularity shell
singularity shell --nv -H <local_path>:/home envs/rl-generalization/roboschool.sif
```
- Then you can test it by import roboschool in a python3 shell.

General Form to Run Our Implementation of Recurrent Model-Free RL and Compared Methods

Basically, we use .yml file in configs/ folder for each subarea of POMDPs. To run our implementation, in <local_path> simply use

export PYTHONPATH=${PWD}:$PYTHONPATH
python3 policies/main.py configs/<subarea>/<env_name>/<algo_name>.yml

where algo_name specifies the algorithm name:

sac_rnn and td3_rnn correspond to our implementation of recurrent model-free RL
ppo_rnn and a2c_rnn correspond to (Kostrikov, 2018) implementation of recurrent model-free RL
vrm corresponds to VRM compared in "standard" POMDPs
varibad corresponds the off-policy version of original VariBAD compared in meta RL
MRPO correspond to MRPO compared in robust RL

We have merged the prior methods above into our repository (there is no need to install other repositories), so that future work can use this single repository to run a number of baselines besides ours: A2C-GRU, PPO-GRU, VRM, VariBAD, MRPO. Since our code is heavily drawn from those prior works, we encourage authors to cite those prior papers or implementations. For the compared methods, we use their open-sourced implementation with their default hyperparameters.

Specific Running Commands for Each Subarea

Please see run_commands.md for details on running our implementation of recurrent model-free RL and also all the compared methods.

A Minimal Example to Run Our Implementation

Here we provide a stand-alone minimal example with the least dependencies to run our implementation of recurrent model-free RL!

Only requires PyTorch and PyBullet, no need to install MuJoCo or roboschool, no external configuration file.

Simply open the Jupyter Notebook example.ipynb and it contains the training and evaluation procedure on a toy POMDP environment (Pendulum-V). It only costs < 20 min to run the whole process.

Details of Our Implementation of Recurrent Model-Free RL: Decision Factors, Best Variants, Code Features

Please see our_details.md for more information on:

How to tune the decision factors discussed in the paper in the configuration files
How to tune the other hyperparameters that are also important to training
Where is the core class of our recurrent model-free RL and the RAM-efficient replay buffer
Our best variants in subarea and numeric results on all the bar charts and learning curves

Acknowledgement

Please see acknowledge.md for details.

Citation

If you find our code useful to your work, please consider citing our paper:

@article{ni2021recurrentrl,
  title={Recurrent Model-Free RL is a Strong Baseline for Many POMDPs},
  author={Ni, Tianwei and Eysenbach, Benjamin and Salakhutdinov, Ruslan},
  year={2021}
}

Contact

If you have any questions, please create an issue in this repo or contact Tianwei Ni ([email protected])

Simple (but Strong) Baselines for POMDPs

Related tags

Overview

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Installation

General Form to Run Our Implementation of Recurrent Model-Free RL and Compared Methods

Specific Running Commands for Each Subarea

A Minimal Example to Run Our Implementation

Details of Our Implementation of Recurrent Model-Free RL: Decision Factors, Best Variants, Code Features

Acknowledgement

Citation

Contact

Owner

Tianwei V. Ni

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Learning cell communication from spatial graphs of cells

Self-Supervised Image Denoising via Iterative Data Refinement

This repo is customed for VisDrone.

Gauge equivariant mesh cnn

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models (published in ICLR2018)

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Parallel Latent Tree-Induction for Faster Sequence Encoding

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

3D ResNets for Action Recognition (CVPR 2018)

A Deep Learning Framework for Neural Derivative Hedging

Flappy bird automation using Neuroevolution of Augmenting Topologies (NEAT) in Python