Safe Policy Optimization with Local Features

Last update: Jun 05, 2022

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

Mandatory
- Gym-MiniGrid (https://github.com/maximecb/gym-minigrid)
- Hydra (https://github.com/facebookresearch/hydra)
- pymdptoolbox (https://github.com/sawcordwell/pymdptoolbox)
Optional
- GPy (https://github.com/SheffieldML/GPy)

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}

Safe Policy Optimization with Local Features

Related tags

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

Installation

Simulation configuration

Grid World Experiment

Safety-Gym Experiment

Citation

Owner

Akifumi Wachi

NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Learning Compatible Embeddings, ICCV 2021

ML models and internal tensors 3D visualizer

Stacked Generative Adversarial Networks

Classification Modeling: Probability of Default

Benchmark for Answering Existential First Order Queries with Single Free Variable

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

This is a TensorFlow implementation for C2-Rec

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

A new data augmentation method for extreme lighting conditions.

An offline deep reinforcement learning library

Implementation of a Transformer, but completely in Triton

The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Pytorch Implementation of PointNet and PointNet++++

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Warning: This project does not have any current developer. See bellow.

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

BARTScore: Evaluating Generated Text as Text Generation

Explainability for Vision Transformers (in PyTorch)