Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Related tags

Deep Learningibc
Overview

Implicit Behavioral Cloning

This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper:

Implicit Behavioral Cloning (website link) (arXiv link)
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson
Conference on Robot Learning (CoRL) 2021

Abstract

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

Prerequisites

The code for this project uses python 3.7+ and the following pip packages:

python3 -m pip install --upgrade pip
pip install \
  absl-py==0.12.0 \
  gin-config==0.4.0 \
  matplotlib==3.4.3 \
  mediapy==1.0.3 \
  opencv-python==4.5.3.56 \
  pybullet==3.1.6 \
  scipy==1.7.1 \
  tensorflow==2.6.0 \
  tensorflow-probability==0.13.0 \
  tf-agents-nightly==0.10.0.dev20210930 \
  tqdm==4.62.2

(Optional): For Mujoco support, see docs/mujoco_setup.md. Recommended to skip it unless you specifically want to run the Adroit and Kitchen environments.

Quickstart: from 0 to a trained IBC policy in 10 minutes.

Step 1: Install listed Python packages above in Prerequisites.

Step 2: Run unit tests (should take less than a minute), and do this from the directory just above the top-level ibc directory:

./ibc/run_tests.sh

Step 3: Check that Tensorflow has GPU access:

python3 -c "import tensorflow as tf; print(tf.test.is_gpu_available())"

If the above prints False, see the following requirements, notably CUDA 11.2 and cuDNN 8.1.0: https://www.tensorflow.org/install/gpu#software_requirements.

Step 4: Let's do an example Block Pushing task, so first let's download oracle data (or see Tasks for how to generate it):

cd ibc/data
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip block_push_states_location.zip && rm block_push_states_location.zip
cd ../..

Step 5: Set PYTHONPATH to include the directory just above top-level ibc, so if you've been following the commands above it is:

export PYTHONPATH=$PYTHONPATH:${PWD}

Step 6: On that example Block Pushing task, we'll next do a training + evaluation with Implicit BC:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Some notes:

  • On an example single-GPU machine (GTX 2080 Ti), the above trains at about 18 steps/sec, and should get to high success rates in 5,000 or 10,000 steps (roughly 5-10 minutes of training).
  • The mlp_ebm.gin is just one config, with is meant to be reasonably fast to train, with only 20 evals at each interval, and is not suitable for all tasks. See Tasks for more configs.
  • Due to the --video flag above, you can watch a video of the learned policy in action at: /tmp/ibc_logs/mlp_ebm/ibc_dfo/... navigate to the videos/ttl=7d subfolder, and by default there should be one example .mp4 video saved every time you do an evaluation interval.

(Optional) Step 7: For the pybullet-based tasks, we also have real-time interactive visualization set up through a visualization server, so in one terminal:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 -m pybullet_utils.runServer

And in a different terminal run the oracle a few times with the --shared_memory flag:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 ibc/data/policy_eval.py -- \
  --alsologtostderr \
  --shared_memory \
  --num_episodes=3 \
  --policy=oracle_push \
  --task=PUSH

You're done with Quickstart! See below for more Tasks, and also see docs/codebase_overview.md and docs/workflow.md for additional info.

Tasks

Task: Particle

In this task, the goal is for the agent (black dot) to first go to the green dot, then the blue dot.

Example IBC policy Example MSE policy

Get Data

We can either generate data from scratch, for example for 2D (takes 15 seconds):

./ibc/ibc/configs/particle/collect_data.sh

Or just download all the data for all different dimensions:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/particle.zip
unzip particle.zip && rm particle.zip
cd ../..

Train and Evaluate

Let's start with some small networks, on just the 2D version since it's easiest to visualize, and compare MSE and IBC. Here's a small-network (256x2) IBC-with-Langevin config, where 2 is the argument for the environment dimensionality.

./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 2

And here's an idenitcally sized network (256x2) but with MSE config:

./ibc/ibc/configs/particle/run_mlp_mse.sh 2

For the above configurations, we suggest comparing the rollout videos, which you can find at /tmp/ibc_logs/...corresponding_directory../videos/. At the top of this section is shown a comparison at 10,000 training steps for the two different above configs.

And here are the best configs respectfully for IBC (with langevin) and MSE, in this case run on the 16-dimensional environment:

./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 16
./ibc/ibc/configs/particle/run_mlp_mse_best.sh 16

Note: the _best config is kind of slow for Langevin to train, but even just ./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 16 (smaller network) seems to solve the 16-D environment pretty well, and is much faster to train.

Task: Block Pushing (from state observations)

Get Data

We can either generate data from scratch (~2 minutes for 2,000 episodes: 200 each across 10 replicas):

./ibc/ibc/configs/pushing_states/collect_data.sh

Or we can download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip 'block_push_states_location.zip' && rm block_push_states_location.zip
cd ../..

Train and Evaluate

Here's reasonably fast-to-train config for IBC with DFO:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Or here's a config for IBC with Langevin:

./ibc/ibc/configs/pushing_states/run_mlp_ebm_langevin.sh

Or here's a comparable, reasonably fast-to-train config for MSE:

./ibc/ibc/configs/pushing_states/run_mlp_mse.sh

Or to run the best configs respectfully for IBC, MSE, and MDN (some of these might be slower to train than the above):

./ibc/ibc/configs/pushing_states/run_mlp_ebm_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mse_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mdn_best.sh

Task: Block Pushing (from image observations)

Get Data

Download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_visual_location.zip
unzip 'block_push_visual_location.zip' && rm block_push_visual_location.zip
cd ../..

Train and Evaluate

Here is an IBC with Langevin configuration which should actually converge faster than the IBC-with-DFO that we reported in the paper:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_langevin.sh

And here are the best configs respectfully for IBC (with DFO), MSE, and MDN:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mse_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mdn_best.sh

Task: D4RL Adroit and Kitchen

Get Data

The D4RL human demonstration training data used for the paper submission can be downloaded using the commands below. This data has been processed into a .tfrecord format from the original D4RL data format:

cd ibc/data && mkdir -p d4rl_trajectories && cd d4rl_trajectories
wget https://storage.googleapis.com/brain-reach-public/ibc_data/door-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/hammer-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-complete-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-mixed-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-partial-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/pen-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/relocate-human-v0.zip
unzip '*.zip' && rm *.zip
cd ../../..

Run Train Eval:

Here are the best configs respectfully for IBC (with Langevin), and MSE: On a 2080 Ti GPU test, this IBC config trains at only 1.7 steps/sec, but it is about 10x faster on TPUv3.

./ibc/ibc/configs/d4rl/run_mlp_ebm_langevin_best.sh pen-human-v0
./ibc/ibc/configs/d4rl/run_mlp_mse_best.sh pen-human-v0

The above commands will run on the pen-human-v0 environment, but you can swap this arg for whichever of the provided Adroit/Kitchen environments.

Here also is an MDN config you can try. The network size is tiny but if you increase it heavily then it seems to get NaNs during training. In general MDNs can be finicky. A solution should be possible though.

./ibc/ibc/configs/d4rl/run_mlp_mdn.sh pen-human-v0

Summary for Reproducing Results

For the tasks that we've been able to open-source, results from the paper should be reproducible by using the linked data and command-line args below.

Task Figure/Table in paper Data Train + Eval commands
Coordinate regression Figure 4 See colab See colab
D4RL Adroit + Kitchen Table 2 Link Link
N-D particle Figure 6 Link Link
Simulated pushing, single target, states Table 3 Link Link
Simulated pushing, single target, pixels Table 3 Link Link

Citation

If you found our paper/code useful in your research, please consider citing:

@article{florence2021implicit,
    title={Implicit Behavioral Cloning},
    author={Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan},
    journal={Conference on Robot Learning (CoRL)},
    month = {November},
    year={2021}
}
Owner
Google Research
Google Research
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

76 Jan 03, 2023
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022
Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

BEVNet Datasets Datasets should be put inside data/. For example, data/semantic_kitti_4class_100x100. Training BEVNet-S Example: cd experiments bash t

(Brian) JoonHo Lee 24 Dec 12, 2022
This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Visual Attributes in the Wild (VAW) This repository provides data for the VAW dataset as described in the CVPR 2021 Paper: Learning to Predict Visual

Adobe Research 36 Dec 30, 2022
Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022
Code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning".

0. Introduction This repository contains the source code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning". Notes The netwo

NetX Group 68 Nov 24, 2022
Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

26 Dec 07, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
Torchreid: Deep learning person re-identification in PyTorch.

Torchreid Torchreid is a library for deep-learning person re-identification, written in PyTorch. It features: multi-GPU training support both image- a

Kaiyang 3.7k Jan 05, 2023
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 2k Jan 09, 2023
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19) Tianyu Wang*, Xin Yang*, Ke Xu, Shaozhe Chen, Qiang Zhang, Ry

Steve Wong 177 Dec 01, 2022
SeqAttack: a framework for adversarial attacks on token classification models

A framework for adversarial attacks against token classification models

Walter 23 Nov 25, 2022
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo

Chanwoo Kim 13 Dec 18, 2022
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

Yin Cui 546 Jan 08, 2023
Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes

Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes [Paper] Method overview 4DMatch Benchmark 4DMatch is a benchmark for matc

103 Jan 06, 2023
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

Qiangeng Xu 163 Dec 19, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 05, 2023
This is a repository of our model for weakly-supervised video dense anticipation.

Introduction This is a repository of our model for weakly-supervised video dense anticipation. More results on GTEA, Epic-Kitchens etc. will come soon

2 Apr 09, 2022