An efficient framework for reinforcement learning.

Overview

rl: An efficient framework for reinforcement learning

Python

Requirements

name version
Python >=3.7
numpy >=1.19
torch >=1.7
tensorboard >=2.5
tensorboardX >=2.4
gym >=0.18.3

Make sure your Python environment is activated before installing following requirements.
pip install -U gym tensorboard tensorboardx

Introduction

Quick Start

CartPole-v0:
python demo.py
Enter the following commands in terminal to start training Pendulum-v0:
python demo.py --env_name Pendulum-v0 --target_reward -250.0
Use Recurrent Neural Network:
python demo.py --env_name Pendulum-v0 --target_reward -250.0 --use_rnn --log_dir Pendulum-v0_RNN
Open a new terminal:
tensorboard --logdir=result
Then you can access the training information by visiting http://localhost:6006/ in browser.

Structure

Proximal Policy Optimization

PPO is an on-policy and model-free reinforcement learning algorithm.

Components

  • Generalized Advantage Estimation (GAE)
  • Gate Recurrent Unit (GRU)

Hyperparameters

hyperparameter note value
env_num number of parallel processes 16
chunk_len BPTT for GRU 10
eps clipping parameter 0.2
gamma discount factor 0.99
gae_lambda trade-off between TD and MC 0.95
entropy_coef coefficient of entropy 0.05
ppo_epoch data usage 5
adv_norm normalized advantage 1 (True)
max_norm gradient clipping (L2) 20.0
weight_decay weight decay (L2) 1e-6
lr_actor learning rate of actor network 1e-3
lr_critic learning rate of critic network 1e-3

Test Environment

A simple test environment for verifying the effectiveness of this algorithm (of course, the algorithm can also be implemented by yourself).
Simple logic with less code.

Mechanism

The environment chooses one number randomly in every step, and returns the one-hot matrix.
If the action taken matches the number chosen in the last 3 steps, you will get a complete reward of 1.

>>> from env.test_env import TestEnv
>>> env = TestEnv()
>>> env.seed(0)
>>> env.reset()
array([1., 0., 0.], dtype=float32)
>>> env.step(9 * 0 + 3 * 0 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 1 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 0.0, False, {'str': 'Completely wrong.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 0., 1.], dtype=float32), 0.6666666666666666, False, {'str': 'Partially correct.'})
>>> env.step(9 * 2 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 0.3333333333333333, False, {'str': 'Partially correct.'})
>>> env.step(9 * 0 + 3 * 2 + 1 * 1)
(array([0., 0., 1.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>>

Convergence Reward

  • General RL algorithms will achieve an average reward of 55.5.
  • Because of the state memory unit, RNN based RL algorithms can reach the goal of 100.0.

2021, ICCD Lab, Dalian University of Technology. Author: Jingcheng Jiang.

SymPy-powered, Wolfram|Alpha-like answer engine totally in your browser, without backend computation

SymPy Beta SymPy Beta is a fork of SymPy Gamma. The purpose of this project is to run a SymPy-powered, Wolfram|Alpha-like answer engine totally in you

Liumeo 25 Dec 21, 2022
Fast SHAP value computation for interpreting tree-based models

FastTreeSHAP FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 X

LinkedIn 369 Jan 04, 2023
Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv 官方原版代

Hongtao Wen 4 Jan 15, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 36 Oct 31, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
BTC-Generator - BTC Generator With Python

Что такое BTC-Generator? Это генератор чеков всеми любимого @BTC_BANKER_BOT Для

DoomGod 3 Aug 24, 2022
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

Graph Neural Networks with Learnable Structural and Positional Representations Source code for the paper "Graph Neural Networks with Learnable Structu

Vijay Prakash Dwivedi 180 Dec 22, 2022
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

Phil Wang 62 Dec 20, 2022
Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

The Temporal Robustness of Stochastic Signals Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals" Case stud

0 Oct 28, 2021
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT) By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Cha

HUAWEI Noah's Ark Lab 332 Dec 18, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
Application of the L2HMC algorithm to simulations in lattice QCD.

l2hmc-qcd 📊 Slides Recent talk on Training Topological Samplers for Lattice Gauge Theory from the Machine Learning for High Energy Physics, on and of

Sam Foreman 37 Dec 14, 2022
A Novel Plug-in Module for Fine-grained Visual Classification

Pytorch implementation for A Novel Plug-in Module for Fine-Grained Visual Classification. fine-grained visual classification task.

ChouPoYung 109 Dec 20, 2022
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv) This is a Pytorch implementation of our paper. We present Vision

Qibin (Andrew) Hou 162 Nov 28, 2022
RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos Implementation for "3D Human Pose, Shape and Texture from Low-Resoluti

XiangyuXu 42 Nov 10, 2022
A crossplatform menu bar application using mpv as DLNA Media Renderer.

Macast Chinese README A menu bar application using mpv as DLNA Media Renderer. Install MacOS || Windows || Debian Download link: Macast release latest

4.4k Jan 01, 2023
When are Iterative GPs Numerically Accurate?

When are Iterative GPs Numerically Accurate? This is a code repository for the paper "When are Iterative GPs Numerically Accurate?" by Wesley Maddox,

Wesley Maddox 1 Jan 06, 2022
Diffusion Probabilistic Models for 3D Point Cloud Generation (CVPR 2021)

Diffusion Probabilistic Models for 3D Point Cloud Generation [Paper] [Code] The official code repository for our CVPR 2021 paper "Diffusion Probabilis

Shitong Luo 323 Jan 05, 2023