Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}

Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Overview

Addressing Function Approximation Error in Actor-Critic Methods

Usage

Results

Bibtex

Owner

Scott Fujimoto

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

mmdetection version of TinyBenchmark.

Create Data & AI apps in 20 lines of code with Shimoku

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

A Python package for faster, safer, and simpler ML processes

[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Toward Multimodal Image-to-Image Translation

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

RL and distillation in CARLA using a factorized world model