The Unsupervised Reinforcement Learning Benchmark (URLB)

Last update: Dec 26, 2022

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command	Implementation Author(s)	Paper
ICM	`agent=icm`	Denis	paper
ProtoRL	`agent=proto`	Denis	paper
DIAYN	`agent=diayn`	Misha	paper
APT(ICM)	`agent=icm_apt`	Hao, Kimin	paper
APT(Ind)	`agent=ind_apt`	Hao, Kimin	paper
APS	`agent=aps`	Hao, Kimin	paper
SMM	`agent=smm`	Albert	paper
RND	`agent=rnd`	Kevin	paper
Disagreement	`agent=disagreement`	Catherine	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`stand`, `walk`, `run`, `flip`
`quadruped`	`walk`, `run`, `stand`, `jump`
`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model	Command
states	`obs_type=states`
pixels	`obs_type=pixels`

Instructions

Pre-training

To run pre-training use the pretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trained ICM, you can fine-tune it on walker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in ./pretrained_models/states/walker/icm/snapshot_1000000.pt, initialize DDPG with it (both the actor and critic), and start training on walker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and the reward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

The Unsupervised Reinforcement Learning Benchmark (URLB)

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

Owner

Combining Diverse Feature Priors

Using LSTM write Tang poetry

Efficient 3D human pose estimation in video using 2D keypoint trajectories

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

用opencv的dnn模块做yolov5目标检测，包含C++和Python两个版本的程序

A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Inflated i3d network with inception backbone, weights transfered from tensorflow

The source code of CVPR17 'Generative Face Completion'.

A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

nn_builder lets you build neural networks with less boilerplate code

This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

audioLIME: Listenable Explanations Using Source Separation

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

Deep Learning Head Pose Estimation using PyTorch.

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

PyTorch package for the discrete VAE used for DALL·E.

Deeprl - Standard DQN and dueling network for simple games

Generating Images with Recurrent Adversarial Networks