CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Overview

CALVIN

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks Oier Mees, Lukas Hermann, Erick Rosete, Wolfram Burgard

We present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites.

💻 Quick Start

To begin, clone this repository locally

git clone --recurse-submodules https://github.com/mees/calvin.git
$ export CALVIN_ROOT=$(pwd)/calvin

Install requirements:

$ cd $CALVIN_ROOT
$ virtualenv -p $(which python3) --system-site-packages calvin_env # or use conda
$ source calvin_env/bin/activate
$ sh install.sh

Download dataset (choose which split you want to download with the argument D, ABC or ABCD):

$ cd $CALVIN_ROOT/dataset
$ sh download_data.sh D | ABC | ABCD

🏋️‍♂️ Train Baseline Agent

Train baseline models:

$ cd $CALVIN_ROOT/calvin_models/calvin_agent
$ python training.py

You want to scale your training to a multi-gpu setup? Just specify the number of GPUs and DDP will automatically be used for training thanks to Pytorch Lightning. To train on all available GPUs:

$ python training.py trainer.gpus=-1

If you have access to a Slurm cluster, we also provide trainings scripts here.

You can use Hydra's flexible overriding system for changing hyperparameters. For example, to train a model with rgb images from both static camera and the gripper camera:

$ python training.py datamodule/observation_space=lang_rgb_static_gripper model/perceptual_encoder=gripper_cam

To train a model with RGB-D from both cameras:

$ python training.py datamodule/observation_space=lang_rgbd_both model/perceptual_encoder=RGBD_both

To train a model with rgb images from the static camera and visual tactile observations:

$ python training.py datamodule/observation_space=lang_rgb_static_tactile model/perceptual_encoder=static_RGB_tactile

To see all available hyperparameters:

$ python training.py --help

To resume a training, just override the hydra working directory :

$ python training.py hydra.run.dir=runs/my_dir

🖼️ Sensory Observations

CALVIN supports a range of sensors commonly utilized for visuomotor control:

  1. Static camera RGB images - with shape 200x200x3.
  2. Static camera Depth maps - with shape 200x200x1.
  3. Gripper camera RGB images - with shape 200x200x3.
  4. Gripper camera Depth maps - with shape 200x200x1.
  5. Tactile image - with shape 120x160x2x3.
  6. Proprioceptive state - EE position (3), EE orientation in euler angles (3), gripper width (1), joint positions (7), gripper action (1).

🕹️ Action Space

In CALVIN, the agent must perform closed-loop continuous control to follow unconstrained language instructions characterizing complex robot manipulation tasks, sending continuous actions to the robot at 30hz. In order to give researchers and practitioners the freedom to experiment with different action spaces, CALVIN supports the following actions spaces:

  1. Absolute cartesian pose - EE position (3), EE orientation in euler angles (3), gripper action (1).
  2. Relative cartesian displacement - EE position (3), EE orientation in euler angles (3), gripper action (1).
  3. Joint action - Joint positions (7), gripper action (1).

💪 Evaluation: The Calvin Challenge

Long-horizon Multi-task Language Control (LH-MTLC)

The aim of the CALVIN benchmark is to evaluate the learning of long-horizon language-conditioned continuous control policies. In this setting, a single agent must solve complex manipulation tasks by understanding a series of unconstrained language expressions in a row, e.g., “open the drawer. . . pick up the blue block. . . now push the block into the drawer. . . now open the sliding door”. We provide an evaluation protocol with evaluation modes of varying difficulty by choosing different combinations of sensor suites and amounts of training environments. To avoid a biased initial position, the robot is reset to a neutral position before every multi-step sequence.

To evaluate a trained calvin baseline agent, run the following command:

$ cd $CALVIN_ROOT/calvin_models/calvin_agent
$ python evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER>

Optional arguments:

  • --checkpoint <PATH/TO/CHECKPOINT>: by default, the evaluation loads the last checkpoint in the training log directory. You can instead specify the path to another checkpoint by adding this to the evaluation command.
  • --debug: print debug information and visualize environment.

If you want to evaluate your own model architecture on the CALVIN challenge, you can implement the CustomModel class in evaluate_policy.py as an interface to your agent. You need to implement the following methods:

  • __init__(): gets called once at the beginning of the evaluation.
  • reset(): gets called at the beginning of each evaluation sequence.
  • step(obs, goal): gets called every step and returns the predicted action.

Then evaluate the model by running:

$ python evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --custom_model

You are also free to use your own language model instead of using the precomputed language embeddings provided by CALVIN. For this, implement CustomLangEmbeddings in evaluate_policy.py and add --custom_lang_embeddings to the evaluation command.

Multi-task Language Control (MTLC)

Alternatively, you can evaluate the policy on single tasks and without resetting the robot to a neutral position. Note that this evaluation is currently only available for our baseline agent.

$ python evaluation/evaluate_policy_singlestep.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER> [--checkpoint <PATH/TO/CHECKPOINT>] [--debug]

Pre-trained Model

Download the MCIL model checkpoint trained on the static camera rgb images on environment D.

$ wget http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip
$ unzip D_D_static_rgb_baseline.zip

💬 Relabeling Raw Language Annotations

You want to try learning language conditioned policies in CALVIN with a new awesome language model?

We provide an example script to relabel the annotations with different language model provided in SBert, such as the larger MPNet (paraphrase-mpnet-base-v2) or its corresponding multilingual model (paraphrase-multilingual-mpnet-base-v2). The supported options are "mini", "mpnet" and "multi". If you want to try different SBert models, just change the model name here.

cd $CALVIN_ROOT/calvin_models/calvin_agent
python utils/relabel_with_new_lang_model.py +path=$CALVIN_ROOT/dataset/task_D_D/ +name_folder=new_lang_model_folder model.nlp_model=mpnet

If you additionally want to sample different language annotations for each sequence (from the same task annotations) in the training split run the same command with the parameter reannotate=true.

📈 SOTA Models

Open-source models that outperform the MCIL baselines from CALVIN:

Contact Oier to add your model here.

Reinforcement Learning with CALVIN

Are you interested in trying reinforcement learning agents for the different manipulation tasks in the CALVIN environment? We provide a google colab to showcase how to leverage the CALVIN task indicators to learn RL agents with a sparse reward.

Citation

If you find the dataset or code useful, please cite:

@article{calvin21,
author = {Oier Mees and Lukas Hermann and Erick Rosete-Beas and Wolfram Burgard},
title = {CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks},
journal={arXiv preprint arXiv:2112.03227},
year = 2021,
}

License

MIT License

Owner
Oier Mees
PhD Student at the University of Freiburg, Germany. Researcher in Machine Learning and Robotics.
Oier Mees
[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

CSDNet-CSDGAN this is the code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement" Environment Preparing pyt

Jiaao Zhang 17 Nov 05, 2022
Scripts and misc. stuff related to the PortSwigger Web Academy

PortSwigger Web Academy Notes Mostly scripts to automate the exploits. Going in the order of the recomended learning path - starting with SQLi. Commun

pageinsec 17 Dec 30, 2022
Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection". LRPDenseNet.py

Pedro Ricardo Ariel Salvador Bassi 2 Sep 21, 2022
Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

Description Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ance

Finlay Maguire 1 Jan 06, 2022
Beancount-mercury - Beancount importer for Mercury Startup Checking

beancount-mercury beancount-mercury provides an Importer for converting CSV expo

Michael Lynch 4 Oct 31, 2022
🎃 Core identification module of AI powerful point reading system platform.

ppReader-Kernel Intro Core identification module of AI powerful point reading system platform. Usage 硬件: Windows10、GPU:nvdia GTX 1060 、普通RBG相机 软件: con

CrashKing 1 Jan 11, 2022
Earth Vision Foundation

EVer - A Library for Earth Vision Researcher EVer is a Pytorch-based Python library to simplify the training and inference of the deep learning model.

Zhuo Zheng 34 Nov 26, 2022
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 04, 2022
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

11 Nov 08, 2022
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 04, 2023
CNN designed for pansharpening

PROGRESSIVE BAND-SEPARATED CONVOLUTIONAL NEURAL NETWORK FOR MULTISPECTRAL PANSHARPENING This repository contains main code for the paper PROGRESSIVE B

SerendipitysX 3 Dec 29, 2021
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Decentralized Reinforcement Learning This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Ec

40 Oct 30, 2022
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

zshicode 84 Jan 01, 2023
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介 通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型,分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022
STEAL - Learning Semantic Boundaries from Noisy Annotations (CVPR 2019)

STEAL This is the official inference code for: Devil Is in the Edges: Learning Semantic Boundaries from Noisy Annotations David Acuna, Amlan Kar, Sanj

469 Dec 26, 2022
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023