(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Related tags

Deep LearningIQ-Learn
Overview

Inverse Q-Learning (IQ-Learn)

Official code base for IQ-Learn: Inverse soft-Q Learning for Imitation, NeurIPS '21 Spotlight

IQ-Learn is an easy-to-use algorithm that's a drop-in replacement to methods like Behavior Cloning and GAIL, to boost your imitation learning pipelines!
Update: IQ-Learn was recently used to create the best AI agent for playing Minecraft. Placing #1 in NeurIPS MineRL Basalt Challenge using only human demos (Overall Leaderboard Rank #2)

[Project Page]

We introduce Inverse Q-Learning (IQ-Learn), a state-of-the-art novel framework for Imitation Learning (IL), that directly learns soft-Q functions from expert data. IQ-Learn enables non-adverserial imitation learning, working on both offline and online IL settings. It is performant even with very sparse expert data, and scales to complex image-based environments, surpassing prior methods by more than 3x. It is very simple to implement requiring ~15 lines of code on top of existing RL methods.

Inverse Q-Learning is theoretically equivalent to Inverse Reinforcement learning, i.e. learning rewards from expert data. However, it is much more powerful in practice. It admits very simple non-adverserial training and works on complete offline IL settings (without any access to the environment), greatly exceeding Behavior Cloning.

IQ-Learn is the successor to Adversarial Imitation Learning methods like GAIL (coming from the same lab).
It extends the theoretical framework for Inverse RL to non-adverserial and scalable learning, for the first-time showing guaranteed convergence.

Citation

@inproceedings{garg2021iqlearn,
title={IQ-Learn: Inverse soft-Q Learning for Imitation},
author={Divyansh Garg and Shuvam Chakraborty and Chris Cundy and Jiaming Song and Stefano Ermon},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=Aeo-xqtb5p}
}

Key Advantages

Drop-in replacement to Behavior Cloning
Non-adverserial online IL (Successor to GAIL & AIRL)
Simple to implement
Performant with very sparse data (single expert demo)
Scales to Complex Image Envs (SOTA on Atari and playing Minecraft)
Recover rewards from envs

Usage

To install and use IQ-Learn check the instructions provided in the iq_learn folder.

Imitation

Reaching human-level performance on Atari with pure imitation:

Rewards

Recovering environment rewards on GridWorld:

Grid

Questions

Please feel free to email us if you have any questions.

Div Garg ([email protected])

Owner
Divyansh Garg
Making robots intelligent
Divyansh Garg
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Graph Attention Networks

GAT Graph Attention Networks (Veličković et al., ICLR 2018): https://arxiv.org/abs/1710.10903 GAT layer t-SNE + Attention coefficients on Cora Overvie

Petar Veličković 2.6k Jan 05, 2023
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation AniGAN: Style-Guided Generative Adversarial Networks for U

Bing Li 81 Dec 14, 2022
Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

Kao.Geek 88 Dec 19, 2022
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
Learning To Have An Ear For Face Super-Resolution

Learning To Have An Ear For Face Super-Resolution [Project Page] This repository contains demo code of our CVPR2020 paper. Training and evaluation on

50 Nov 16, 2022
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).

96 Dec 27, 2022
Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Dominic Rampas 247 Dec 16, 2022
Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021.

Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021. Bobo Xi, Jiaojiao Li, Yunsong Li and Qian Du. Code f

Bobo Xi 7 Nov 03, 2022
Interpolation-based reduced-order models

Interpolation-reduced-order-models Interpolation-based reduced-order models High-fidelity computational fluid dynamics (CFD) solutions are time consum

Donovan Blais 1 Jan 10, 2022
Perturb-and-max-product: Sampling and learning in discrete energy-based models

Perturb-and-max-product: Sampling and learning in discrete energy-based models This repo contains code for reproducing the results in the paper Pertur

Vicarious 2 Mar 14, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Tgbox-bench - Simple TGBOX upload speed benchmark

TGBOX Benchmark This script will benchmark upload speed to TGBOX storage. Build

Non 1 Jan 09, 2022
Multi-Horizon-Forecasting-for-Limit-Order-Books

Multi-Horizon-Forecasting-for-Limit-Order-Books This jupyter notebook is used to demonstrate our work, Multi-Horizon Forecasting for Limit Order Books

Zihao Zhang 116 Dec 23, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 02, 2023
Walk with fastai

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Walk with fastai What is this p

Walk with fastai 124 Dec 10, 2022
Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Jonas Köhler 893 Dec 28, 2022
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022