RoboDesk A Multi-Task Reinforcement Learning Benchmark

Related tags

Deep Learningrobodesk
Overview

RoboDesk

PyPI

A Multi-Task Reinforcement Learning Benchmark

Robodesk Banner

If you find this open source release useful, please reference in your paper:

@misc{kannan2021robodesk,
  author = {Harini Kannan and Danijar Hafner and Chelsea Finn and Dumitru Erhan},
  title = {RoboDesk: A Multi-Task Reinforcement Learning Benchmark},
  year = {2021},
  howpublished = {\url{https://github.com/google-research/robodesk}},
}

Highlights

  • Diversity: RoboDesk includes 9 diverse tasks that test for a variety of different behaviors within the same environment, making it useful for evaluating transfer, multi-task learning, and global exploration.
  • Complexity: The high-dimensional image inputs contain objects of different shapes and colors, whose initial positions are randomized to avoid naive memorization and require learning algorithms to generalize.
  • Robustness: We carefully designed and tested RoboDesk to ensure fast and stable physics simulation. This avoids objects from intersecting, getting stuck, or quickly flying away, a common problem with some existing environments.
  • Lightweight: RoboDesk comes as a self-contained Python package with few dependencies. The source code is clean and pragmatic, making it a useful blueprint for creating new MuJoCo environments.

Training Agents

Installation: pip3 install -U robodesk

The environment follows the OpenAI Gym interface:

import robodesk

env = robodesk.RoboDesk(seed=0)
obs = env.reset()
assert obs.shape == (64, 64, 3)

done = False
while not done:
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)

Tasks

Robodesk Tasks

The behaviors above were learned using the Dreamer agent. These policies have been learned from scratch and only from pixels, not proprioceptive states.

Task Description
open_slide Push the sliding door all the way to the right, navigating around the other objects.
open_drawer Pull the dark brown drawer all the way open.
push_green Push the green button to turn the green light on.
stack_blocks Stack the upright blue block on top of the flat green block.
upright_block_off_table Push the blue upright block off the table.
flat_block_in_bin Push the green flat block into the blue bin.
flat_block_in_shelf Push the green flat block into the shelf, navigating around the other blocks.
lift_upright_block Grasp the blue upright block and lift it above the table.
lift_ball Grasp the magenta ball and lift it above the table.

Environment Details

Constructor

robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, episode_length=500, image_size=64)
Parameter Description
task Available tasks are open_slide, open_drawer, push_green, stack, upright_block_off_table, flat_block_in_bin, flat_block_in_shelf, lift_upright_block, lift_ball.
reward Available reward types are dense, sparse, success. Success gives only the first sparse reward during the episode, useful for computing success rates during evaluation.
action_repeat Reduces the control frequency by applying each action multiple times. This is faster than using an environment wrapper because only the needed images are rendered.
episode_length Time limit for the episode, can be None.
image_size Size of the image observations in pixels, used for both height and width.

Reward

All rewards are bound between 0 and 1. There are three types of rewards available:

  • Dense rewards are based on Euclidean distances between the objects and their target positions and can include additional terms, for example to encourage the arm to reach the object. These are the easiest rewards for learning.
  • Sparse rewards are either 0 or 1 based on whether the target object is in the target area or not, according to a fixed threshold. Learning from sparse rewards is more challenging.
  • Success rewards are equivalent to the sparse rewards, except that only the first reward is given during each episode. As a result, an episode return of 0 means failure and 1 means sucess at the task. This should only be used during evaluation.

Termination

Episodes end after 500 time steps by default. There are no early terminations.

Observation Space

Each observation is a dictionary that contains the current image, as well as additional information. For the standard benchmark, only the image should be used for learning. The observation dictionary contains the following keys:

Key Space
image Box(0, 255, (64, 64, 3), np.uint8)
qpos_robot Box(-np.inf, np.inf, (9,), np.float32)
qvel_robot Box(-np.inf, np.inf, (9,), np.float32)
qpos_objects Box(-np.inf, np.inf, (26,), np.float32)
qvel_objects Box(-np.inf, np.inf, (26,), np.float32)
end_effector Box(-np.inf, np.inf, (3,), np.float32)

Action Space

RoboDesk uses end effector control with a simple bounded action space:

Box(-1, 1, (5,), np.float32)

Acknowledgements

We thank Ben Eysenbach and Debidatta Dwibedi for their helpful feedback.

Our benchmark builds upon previously open-sourced work. We build upon the desk XMLs first introduced in [1], the Franka XMLs open-sourced in [2], and the Franka meshes open-sourced in [3].

Questions

Please open an issue on Github.

Disclaimer: This is not an official Google product.

Owner
Google Research
Google Research
MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

Front-end View Backend View Table of Contents Description Prerequisites Running Basic Information Measurements User Interface Feedback and usage Descr

Center for Computational Imaging and Personalized Diagnostics 58 Dec 02, 2022
A nutritional label for food for thought.

Lexiscore As a first effort in tackling the theme of information overload in content consumption, I've been working on the lexiscore: a nutritional la

Paul Bricman 34 Nov 08, 2022
Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

Gul Varol 59 Dec 14, 2022
A benchmark framework for Tensorflow

TensorFlow benchmarks This repository contains various TensorFlow benchmarks. Currently, it consists of two projects: PerfZero: A benchmark framework

1.1k Dec 30, 2022
Author's PyTorch implementation of TD3 for OpenAI gym tasks

Addressing Function Approximation Error in Actor-Critic Methods PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If y

Scott Fujimoto 1.3k Dec 25, 2022
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
Off-policy continuous control in PyTorch, with RDPG, RTD3 & RSAC

arXiv technical report soon available. we are updating the readme to be as comprehensive as possible Please ask any questions in Issues, thanks. Intro

Zhihan 31 Dec 30, 2022
Tightness-aware Evaluation Protocol for Scene Text Detection

TIoU-metric Release on 27/03/2019. This repository is built on the ICDAR 2015 evaluation code. If you propose a better metric and require further eval

Yuliang Liu 206 Nov 18, 2022
Koç University deep learning framework.

Knet Knet (pronounced "kay-net") is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU

1.4k Dec 31, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos This repository contains the source code and dataset for the pa

54 Oct 09, 2022
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Lightweight, Python library for fast and reproducible experimentation :microscope:

Steppy What is Steppy? Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation. Steppy lets data scientist fo

minerva.ml 134 Jul 10, 2022
Source code of generalized shuffled linear regression

Generalized-Shuffled-Linear-Regression Code for the ICCV 2021 paper: Generalized Shuffled Linear Regression. Authors: Feiran Li, Kent Fujiwara, Fumio

FEI 7 Oct 26, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline

12 Dec 02, 2022
Example repository for custom C++/CUDA operators for TorchScript

Custom TorchScript Operators Example This repository contains examples for writing, compiling and using custom TorchScript operators. See here for the

106 Dec 14, 2022