Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Overview

Softlearning

Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is fairly thin and primarily optimized for our own development purposes. It utilizes the tf.keras modules for most of the model classes (e.g. policies and value functions). We use Ray for the experiment orchestration. Ray Tune and Autoscaler implement several neat features that enable us to seamlessly run the same experiment scripts that we use for local prototyping to launch large-scale experiments on any chosen cloud service (e.g. GCP or AWS), and intelligently parallelize and distribute training for effective resource allocation.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit.

Getting Started

Prerequisites

The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

  1. Download and install MuJoCo 1.50 and 2.00 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150 and ~/.mujoco/mujoco200_{platform}). Unfortunately, gym and dm_control expect different paths for MuJoCo 2.00 installation, which is why you will need to have it installed both in ~/.mujoco/mujoco200_{platform} and ~/.mujoco/mujoco200. The easiest way is to create a symlink from ~/.mujoco/mujoco200_{plaftorm} -> ~/.mujoco/mujoco200 with: ln -s ~/.mujoco/mujoco200_{platform} ~/.mujoco/mujoco200.

  2. Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

  3. Clone softlearning

git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}
  1. Create and activate conda environment, install softlearning to enable command line interface.
cd ${SOFTLEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${SOFTLEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation

docker-compose

To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.cpu.yml \
        up \
        -d \
        --force-recreate

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/docker-compose.dev.cpu.yml \
    down \
    --rmi all \
    --volumes

Examples

Training and simulating an agent

  1. To train the agent
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000  # Save the checkpoint to resume training later
  1. To simulate the resulting policy: First, find the absolute path that the checkpoint is saved to. By default (i.e. without specifying the log-dir argument to the previous script), the data is saved under ~/ray_results/<universe>/<domain>/<task>/<datatimestamp>-<exp-name>/<trial-id>/<checkpoint-id>. For example: ~/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-sac-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/. The next command assumes that this path is found from ${SAC_CHECKPOINT_DIR} environment variable.
python -m examples.development.simulate_policy \
    ${SAC_CHECKPOINT_DIR} \
    --max-path-length 1000 \
    --num-rollouts 1 \
    --render-kwargs '{"mode": "human"}'

examples.development.main contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/development/main.py --help

optional arguments:
  -h, --help            show this help message and exit
  --universe {robosuite,dm_control,gym}
  --domain DOMAIN
  --task TASK
  --checkpoint-replay-pool CHECKPOINT_REPLAY_POOL
                        Whether a checkpoint should also saved the replay
                        pool. If set, takes precedence over
                        variant['run_params']['checkpoint_replay_pool']. Note
                        that the replay pool is saved (and constructed) piece
                        by piece so that each experience is saved only once.
  --algorithm ALGORITHM
  --policy {gaussian}
  --exp-name EXP_NAME
  --mode MODE
  --run-eagerly RUN_EAGERLY
                        Whether to run tensorflow in eager mode.
  --local-dir LOCAL_DIR
                        Destination local folder to save training results.
  --confirm-remote [CONFIRM_REMOTE]
                        Whether or not to query yes/no on remote run.
  --video-save-frequency VIDEO_SAVE_FREQUENCY
                        Save frequency for videos.
  --cpus CPUS           Cpus to allocate to ray process. Passed to `ray.init`.
  --gpus GPUS           Gpus to allocate to ray process. Passed to `ray.init`.
  --resources RESOURCES
                        Resources to allocate to ray process. Passed to
                        `ray.init`.
  --include-webui INCLUDE_WEBUI
                        Boolean flag indicating whether to start theweb UI,
                        which is a Jupyter notebook. Passed to `ray.init`.
  --temp-dir TEMP_DIR   If provided, it will specify the root temporary
                        directory for the Ray process. Passed to `ray.init`.
  --resources-per-trial RESOURCES_PER_TRIAL
                        Resources to allocate for each trial. Passed to
                        `tune.run`.
  --trial-cpus TRIAL_CPUS
                        CPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for CPUs. Passed to
                        `tune.run`.
  --trial-gpus TRIAL_GPUS
                        GPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for GPUs. Passed to
                        `tune.run`.
  --trial-extra-cpus TRIAL_EXTRA_CPUS
                        Extra CPUs to reserve in case the trials need to
                        launch additional Ray actors that use CPUs.
  --trial-extra-gpus TRIAL_EXTRA_GPUS
                        Extra GPUs to reserve in case the trials need to
                        launch additional Ray actors that use GPUs.
  --num-samples NUM_SAMPLES
                        Number of times to repeat each trial. Passed to
                        `tune.run`.
  --upload-dir UPLOAD_DIR
                        Optional URI to sync training results to (e.g.
                        s3://<bucket> or gs://<bucket>). Passed to `tune.run`.
  --trial-name-template TRIAL_NAME_TEMPLATE
                        Optional string template for trial name. For example:
                        '{trial.trial_id}-seed={trial.config[run_params][seed]
                        }' Passed to `tune.run`.
  --checkpoint-frequency CHECKPOINT_FREQUENCY
                        How many training iterations between checkpoints. A
                        value of 0 (default) disables checkpointing. If set,
                        takes precedence over
                        variant['run_params']['checkpoint_frequency']. Passed
                        to `tune.run`.
  --checkpoint-at-end CHECKPOINT_AT_END
                        Whether to checkpoint at the end of the experiment. If
                        set, takes precedence over
                        variant['run_params']['checkpoint_at_end']. Passed to
                        `tune.run`.
  --max-failures MAX_FAILURES
                        Try to recover a trial from its last checkpoint at
                        least this many times. Only applies if checkpointing
                        is enabled. Passed to `tune.run`.
  --restore RESTORE     Path to checkpoint. Only makes sense to set if running
                        1 trial. Defaults to None. Passed to `tune.run`.
  --server-port SERVER_PORT
                        Port number for launching TuneServer. Passed to
                        `tune.run`.

Resume training from a saved checkpoint

This feature is currently broken!

In order to resume training from previous checkpoint, run the original example main-script, with an additional --restore flag. For example, the previous example can be resumed as follows:

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000 \
    --restore ${SAC_CHECKPOINT_PATH}

References

The algorithms are based on the following papers:

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja*, Aurick Zhou*, Kristian Hartikainen*, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. arXiv preprint, 2018.
paper | videos

Latent Space Policies for Hierarchical Reinforcement Learning.
Tuomas Haarnoja*, Kristian Hartikainen*, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Composable Deep Reinforcement Learning for Robotic Manipulation.
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine. International Conference on Robotics and Automation (ICRA), 2018.
paper | videos

Reinforcement Learning with Deep Energy-Based Policies.
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine. International Conference on Machine Learning (ICML), 2017.
paper | videos

If Softlearning helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@techreport{haarnoja2018sacapps,
  title={Soft Actor-Critic Algorithms and Applications},
  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},
  journal={arXiv preprint arXiv:1812.05905},
  year={2018}
}
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MADGRAD Optimization Method A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization pip install madgrad Try it out! A best

Meta Research 774 Dec 31, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
Jax/Flax implementation of Variational-DiffWave.

jax-variational-diffwave Jax/Flax implementation of Variational-DiffWave. (Zhifeng Kong et al., 2020, Diederik P. Kingma et al., 2021.) DiffWave with

YoungJoong Kim 37 Dec 16, 2022
Benchmarks for the Optimal Power Flow Problem

Power Grid Lib - Optimal Power Flow This benchmark library is curated and maintained by the IEEE PES Task Force on Benchmarks for Validation of Emergi

A Library of IEEE PES Power Grid Benchmarks 207 Dec 08, 2022
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners A PyTorch re-implementation of Mask Autoencoder trai

Tianyu Hua 23 Dec 13, 2022
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

NNProject - DeepMask This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. Th

189 Nov 16, 2022
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
Curved Projection Reformation

Description Assuming that we already know the image of the centerline, we want the lumen to be displayed on a plane, which requires curved projection

夜听残荷 5 Sep 11, 2022
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
Code for Blind Image Decomposition (BID) and Blind Image Decomposition network (BIDeN).

arXiv, porject page, paper Blind Image Decomposition (BID) Blind Image Decomposition is a novel task. The task requires separating a superimposed imag

64 Dec 20, 2022
Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)

Greedy Gradient Ensemble for De-biased VQA Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can

21 Jun 29, 2022
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?" Install // Datasets // Experiments // Models // License // Reference Full video Offi

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL labels about 43 hours of mocap sequences from AMASS [1] with action labels.

113 Dec 28, 2022
Testbed of AI Systems Quality Management

qunomon Description A testbed for testing and managing AI system qualities. Demo Sorry. Not deployment public server at alpha version. Requirement Ins

AIST AIRC 15 Nov 27, 2021
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

538 Jan 09, 2023
PyTorch implementation of Deformable Convolution

PyTorch implementation of Deformable Convolution !!!Warning: There is some issues in this implementation and this repo is not maintained any more, ple

Wei Ouyang 893 Dec 18, 2022
Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

TEDS-Net Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transfo

Madeleine K Wyburd 14 Jan 04, 2023
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 08, 2023