Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Related tags

Deep Learningsac
Overview

This repository is no longer maintained. Please use our new Softlearning package instead.

Soft Actor-Critic

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit by Vitchyr Pong.

See the DIAYN documentation for using SAC for learning diverse skills.

Getting Started

Soft Actor-Critic can be run either locally or through Docker.

Prerequisites

You will need to have Docker and Docker Compose installed unless you want to run the environment locally.

Most of the models require a Mujoco license.

Docker installation

If you want to run the Mujoco environments, the docker environment needs to know where to find your Mujoco license key (mjkey.txt). You can either copy your key into /.mujoco/mjkey.txt , or you can specify the path to the key in your environment variables:

export MUJOCO_LICENSE_PATH=
   
    /mjkey.txt

   

Once that's done, you can run the Docker container with

docker-compose up

Docker compose creates a Docker container named soft-actor-critic and automatically sets the needed environment variables and volumes.

You can access the container with the typical Docker exec-command, i.e.

docker exec -it soft-actor-critic bash

See examples section for examples of how to train and simulate the agents.

To clean up the setup:

docker-compose down

Local installation

To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.

  1. Clone rllab
cd 
   
    
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}

   
  1. Download and copy mujoco files to rllab path: If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the .dylib files instead of .so files.
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir 
   
    /rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so 
    
     /rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 
     
      /rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp

     
    
   
  1. Copy your Mujoco license key (mjkey.txt) to rllab path:
cp 
   
    /mjkey.txt 
    
     /rllab/vendor/mujoco

    
   
  1. Clone sac
cd 
   
    
git clone https://github.com/haarnoja/sac.git
cd sac

   
  1. Create and activate conda environment
cd sac
conda env create -f environment.yml
source activate sac

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

source deactivate
conda remove --name sac --all

Examples

Training and simulating an agent

  1. To train the agent
python ./examples/mujoco_all_sac.py --env=swimmer --log_dir="/root/sac/data/swimmer-experiment"
  1. To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)
python ./scripts/sim_policy.py /root/sac/data/swimmer-experiment/itr_
   
    .pkl

   

mujoco_all_sac.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:

python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]

mujoco_all_sac.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:

python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]

Benchmark Results

Benchmark results for some of the OpenAI Gym v2 environments can be found here.

Credits

The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.

Reference

@article{haarnoja2017soft,
  title={Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor},
  author={Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey},
  booktitle={Deep Reinforcement Learning Symposium},
  year={2017}
}
Owner
Tuomas Haarnoja
Tuomas Haarnoja
Posterior predictive distributions quantify uncertainties ignored by point estimates.

Posterior predictive distributions quantify uncertainties ignored by point estimates.

DeepMind 177 Dec 06, 2022
Sarus implementation of classical ML models. The models are implemented using the Keras API of tensorflow 2. Vizualization are implemented and can be seen in tensorboard.

Sarus published models Sarus implementation of classical ML models. The models are implemented using the Keras API of tensorflow 2. Vizualization are

Sarus Technologies 39 Aug 19, 2022
TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

TensorFlow-Image-Models Introduction Usage Models Profiling License Introduction TensorfFlow-Image-Models (tfimm) is a collection of image models with

Martins Bruveris 227 Dec 20, 2022
Jiminy Cricket Environment (NeurIPS 2021)

Jiminy Cricket This is the repository for "What Would Jiminy Cricket Do? Towards Agents That Behave Morally" by Dan Hendrycks*, Mantas Mazeika*, Andy

Dan Hendrycks 15 Aug 29, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Unsupervised JPEG Domain Adaptation for Practical Digital Image Forensics @WIFS2021 (Montpellier, France) Rony Abecidan, Vincent Itier, Jeremie Boulan

Rony Abecidan 6 Jan 06, 2023
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

117 Dec 07, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
Gradient representations in ReLU networks as similarity functions

Gradient representations in ReLU networks as similarity functions by Dániel Rácz and Bálint Daróczy. This repo contains the python code related to our

1 Oct 08, 2021
METER: Multimodal End-to-end TransformER

METER Code and pre-trained models will be publicized soon. Citation @article{dou2021meter, title={An Empirical Study of Training End-to-End Vision-a

Zi-Yi Dou 257 Jan 06, 2023
GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Model This repository is the official PyTorch implementation of GraphRNN, a graph gene

Jiaxuan 568 Dec 29, 2022
学习 python3 以来写的一些垃圾玩具……

和东哥做兄弟 Author: chiupam 版权 未经本人同意,仓库内所有资源文件,禁止任何公众号、自媒体、开发者进行任何形式的转载、发布、搬运。 声明 这不是一个开源项目,只是把 GitHub 当作一个代码的存储空间,本项目不接受任何开源要求。 仅用于学习研究,禁止用于商业用途,不能保证其合法性

Chiupam 67 Mar 26, 2022
A micro-game "flappy bird".

1-o-flappy A micro-game "flappy bird". Gameplays The game will be installed at /usr/bin . The name of it is "1-o-flappy". You can type "1-o-flappy" to

1 Nov 06, 2021
General-purpose program synthesiser

DeepSynth General-purpose program synthesiser. This is the repository for the code of the paper "Scaling Neural Program Synthesis with Distribution-ba

Nathanaël Fijalkow 24 Oct 23, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
Framework for training options with different attention mechanism and using them to solve downstream tasks.

Using Attention in HRL Framework for training options with different attention mechanism and using them to solve downstream tasks. Requirements GPU re

5 Nov 03, 2022
This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

HA-in-Fine-Grained-Classification This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-g

16 Oct 29, 2022
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst

HPC-AI Tech 7.9k Jan 08, 2023
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

Salesforce 72 Dec 05, 2022
A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag. Features Connects and logins to a remote CTFd instance. Dumps all

Podalirius 77 Dec 07, 2022