Reinforcement learning framework and algorithms implemented in PyTorch.

Related tags

Deep Learningrlkit
Overview

RLkit

Reinforcement learning framework and algorithms implemented in PyTorch.

Implemented algorithms:

To get started, checkout the example scripts, linked above.

What's New

Version 0.2

04/25/2019

  • Use new multiworld code that requires explicit environment registration.
  • Make installation easier by adding setup.py and using default conf.py.

04/16/2019

  • Log how many train steps were called
  • Log env_info and agent_info.

04/05/2019-04/15/2019

  • Add rendering
  • Fix SAC bug to account for future entropy (#41, #43)
  • Add online algorithm mode (#42)

04/05/2019

The initial release for 0.2 has the following major changes:

  • Remove Serializable class and use default pickle scheme.
  • Remove PyTorchModule class and use native torch.nn.Module directly.
  • Switch to batch-style training rather than online training.
    • Makes code more amenable to parallelization.
    • Implementing the online-version is straightforward.
  • Refactor training code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Refactor sampling code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Implement Skew-Fit: State-Covering Self-Supervised Reinforcement Learning, a method for performing goal-directed exploration to maximize the entropy of visited states.
  • Update soft actor-critic to more closely match TensorFlow implementation:
    • Rename TwinSAC to just SAC.
    • Only have Q networks.
    • Remove unnecessary policy regualization terms.
    • Use numerically stable Jacobian computation.

Overall, the refactors are intended to make the code more modular and readable than the previous versions.

Version 0.1

12/04/2018

  • Add RIG implementation

12/03/2018

  • Add HER implementation
  • Add doodad support

10/16/2018

  • Upgraded to PyTorch v0.4
  • Added Twin Soft Actor Critic Implementation
  • Various small refactor (e.g. logger, evaluate code)

Installation

  1. Install and use the included Ananconda environment
$ conda env create -f environment/[linux-cpu|linux-gpu|mac]-env.yml
$ source activate rlkit
(rlkit) $ python examples/ddpg.py

Choose the appropriate .yml file for your system. These Anaconda environments use MuJoCo 1.5 and gym 0.10.5. You'll need to get your own MuJoCo key if you want to use MuJoCo.

  1. Add this repo directory to your PYTHONPATH environment variable or simply run:
pip install -e .
  1. (Optional) Copy conf.py to conf_private.py and edit to override defaults:
cp rlkit/launchers/conf.py rlkit/launchers/conf_private.py
  1. (Optional) If you plan on running the Skew-Fit experiments or the HER example with the Sawyer environment, then you need to install multiworld.

DISCLAIMER: the mac environment has only been tested without a GPU.

For an even more portable solution, try using the docker image provided in environment/docker. The Anaconda env should be enough, but this docker image addresses some of the rendering issues that may arise when using MuJoCo 1.5 and GPUs. The docker image supports GPU, but it should work without a GPU. To use a GPU with the image, you need to have nvidia-docker installed.

Using a GPU

You can use a GPU by calling

import rlkit.torch.pytorch_util as ptu
ptu.set_gpu_mode(True)

before launching the scripts.

If you are using doodad (see below), simply use the use_gpu flag:

run_experiment(..., use_gpu=True)

Visualizing a policy and seeing results

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/
   
    /
    

    
   
  • LOCAL_LOG_DIR is the directory set by rlkit.launchers.config.LOCAL_LOG_DIR. Default name is 'output'.
  • is given either to setup_logger.
  • is auto-generated and based off of exp_prefix.
  • inside this folder, you should see a file called params.pkl. To visualize a policy, run
(rlkit) $ python scripts/run_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

or

(rlkit) $ python scripts/run_goal_conditioned_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

depending on whether or not the policy is goal-conditioned.

If you have rllab installed, you can also visualize the results using rllab's viskit, described at the bottom of this page

tl;dr run

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

to visualize all experiments with a prefix of exp_prefix. To only visualize a single run, you can do

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/<folder name>

Alternatively, if you don't want to clone all of rllab, a repository containing only viskit can be found here. You can similarly visualize results with.

python viskit/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

This viskit repo also has a few extra nice features, like plotting multiple Y-axis values at once, figure-splitting on multiple keys, and being able to filter hyperparametrs out.

Visualizing a goal-conditioned policy

To visualize a goal-conditioned policy, run

(rlkit) $ python scripts/run_goal_conditioned_policy.py
LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

Launching jobs with doodad

The run_experiment function makes it easy to run Python code on Amazon Web Services (AWS) or Google Cloud Platform (GCP) by using this fork of doodad.

It's as easy as:

from rlkit.launchers.launcher_util import run_experiment

def function_to_run(variant):
    learning_rate = variant['learning_rate']
    ...

run_experiment(
    function_to_run,
    exp_prefix="my-experiment-name",
    mode='ec2',  # or 'gcp'
    variant={'learning_rate': 1e-3},
)

You will need to set up parameters in config.py (see step one of Installation). This requires some knowledge of AWS and/or GCP, which is beyond the scope of this README. To learn more, more about doodad, go to the repository, which is based on this original repository.

Requests for pull-requests

  • Implement policy-gradient algorithms.
  • Implement model-based algorithms.

Legacy Code (v0.1.2)

For Temporal Difference Models (TDMs) and the original implementation of Reinforcement Learning with Imagined Goals (RIG), run git checkout tags/v0.1.2.

References

The algorithms are based on the following papers

Offline Meta-Reinforcement Learning with Online Self-Supervision Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine. arXiv preprint, 2021.

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. Vitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine. ICML, 2020.

Visual Reinforcement Learning with Imagined Goals. Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine. NeurIPS 2018.

Temporal Difference Models: Model-Free Deep RL for Model-Based Control. Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. ICLR 2018.

Hindsight Experience Replay. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba. NeurIPS 2017.

Deep Reinforcement Learning with Double Q-learning. Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016.

Human-level control through deep reinforcement learning. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis. Nature 2015.

Soft Actor-Critic Algorithms and Applications. Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger. ICML, 2018.

Credits

This repository was initially developed primarily by Vitchyr Pong, until July 2021, at which point it was transferred to the RAIL Berkeley organization and is primarily maintained by Ashvin Nair. Other major collaborators and contributions:

A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.

The Dockerfile is based on the OpenAI mujoco-py Dockerfile.

The SMAC code builds off of the PEARL code, which built off of an older RLKit version.

Owner
Robotic AI & Learning Lab Berkeley
Robotic AI & Learning Lab Berkeley
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

RainNet — Official Pytorch Implementation Region-aware Adaptive Instance Normalization for Image Harmonization Jun Ling, Han Xue, Li Song*, Rong Xie,

130 Dec 11, 2022
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
Raptor-Multi-Tool - Raptor Multi Tool With Python

Promises 🔥 20 Stars and I'll fix every error that there is 50 Stars and we will

Aran 44 Jan 04, 2023
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

IJON SPACE EXPLORER IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL. Using only a small (usually one line) annotati

Chair for Sys­tems Se­cu­ri­ty 146 Dec 16, 2022
Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Meng Liu 2 Jul 19, 2022
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

ChrisLiu 5 Oct 18, 2022
Beginner-friendly repository for Hacktober Fest 2021. Start your contribution to open source through baby steps. 💜

Hacktober Fest 2021 🎉 Open source is changing the world – one contribution at a time! 🎉 This repository is made for beginners who are unfamiliar wit

Abhilash M Nair 32 Dec 11, 2022
Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

3 May 12, 2022
DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Responsible Machine Learning With Great Power Comes Great Responsibility. Voltaire (well, maybe) How to develop machine learning models in a responsib

Model Oriented 590 Dec 26, 2022
Simple transformer model for CIFAR10

CIFAR-Transformer Simple transformer model for CIFAR10. Reference: https://www.tensorflow.org/text/tutorials/transformer https://github.com/huggingfac

9 Nov 07, 2022
Revisiting Global Statistics Aggregation for Improving Image Restoration

Revisiting Global Statistics Aggregation for Improving Image Restoration Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu Paper: https://arxiv.org/pd

MEGVII Research 128 Dec 24, 2022
A PyTorch implementation of PointRend: Image Segmentation as Rendering

PointRend A PyTorch implementation of PointRend: Image Segmentation as Rendering [arxiv] [Official Implementation: Detectron2] This repo for Only Sema

AhnDW 336 Dec 26, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022
[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Semantics Disentangling for Generalized Zero-shot Learning This is the official implementation for paper Zhi Chen, Yadan Luo, Ruihong Qiu, Zi Huang, J

25 Dec 06, 2022