Pretraining Representations For Data-Efficient Reinforcement Learning

Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman & Aaron Courville

This repo provides code for implementing SGI.

📦 Install -- Install relevant dependencies and the project
🔧 Usage -- Commands to run different experiments from the paper

Install

To install the requirements, follow these steps:

# PyTorch
export LANG=C.UTF-8
# Install requirements
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

# Finally, install the project
pip install --user -e .

Usage:

The default branch for the latest and stable changes is release.

To run SGI:

Download the DQN replay dataset from https://research.google/tools/datasets/dqn-replay/
- Or substitute your own pre-training data! The codebase expects a series of .gz files, one each for observations, actions and terminals.
To pretrain with SGI:

python -m scripts.run public=True model_folder=./ offline.runner.save_every=2500 \
    env.game=pong seed=1 offline_model_save={your model name} \
    offline.runner.epochs=10 offline.runner.dataloader.games=[Pong] \
    offline.runner.no_eval=1 \
    +offline.algo.goal_weight=1 \
    +offline.algo.inverse_model_weight=1 \
    +offline.algo.spr_weight=1 \
    +offline.algo.target_update_tau=0.01 \
    +offline.agent.model_kwargs.momentum_tau=0.01 \
    do_online=False \
    algo.batch_size=256 \
    +offline.agent.model_kwargs.noisy_nets_std=0 \
    offline.runner.dataloader.dataset_on_disk=True \
    offline.runner.dataloader.samples=1000000 \
    offline.runner.dataloader.checkpoints='{your checkpoints}' \
    offline.runner.dataloader.num_workers=2 \
    offline.runner.dataloader.data_path={your data dir} \
    offline.runner.dataloader.tmp_data_path=./

To fine-tune with SGI:

python -m scripts.run public=True env.game=pong seed=1 num_logs=10  \
    model_load={your_model_name} model_folder=./ \
    algo.encoder_lr=0.000001 algo.q_l1_lr=0.00003 algo.clip_grad_norm=-1 algo.clip_model_grad_norm=-1

When reporting scores, we average across 10 fine-tuning seeds.

./scripts/experiments contains a number of example configurations, including for SGI-M, SGI-M/L and SGI-W, for both pre-training and fine-tuning. Each of these scripts can be launched by providing a game and seed, e.g., ./scripts/experiments/sgim_pretrain.sh pong 1. These scripts are provided primarily to illustrate the hyperparameters used for different experiments; you will likely need to modify the arguments in these scripts to point to your data and model directories.

Data for SGI-R and SGI-E is not included due to its size, but can be re-generated locally. Contact us for details.

What does each file do?

.
├── scripts
│   ├── run.py                # The main runner script to launch jobs.
│   ├── config.yaml           # The hydra configuration file, listing hyperparameters and options.
|   └── experiments           # Configurations for various experiments done by SGI.
|   
├── src                     
│   ├── agent.py              # Implements the Agent API for action selection 
│   ├── algos.py              # Distributional RL loss and optimization
│   ├── models.py             # Forward passes, network initialization.
│   ├── networks.py           # Network architecture and forward passes.
│   ├── offline_dataset.py    # Dataloader for offline data.
│   ├── gcrl.py               # Utils for SGI's goal-conditioned RL objective.
│   ├── rlpyt_atari_env.py    # Slightly modified Atari env from rlpyt
│   ├── rlpyt_utils.py        # Utility methods that we use to extend rlpyt's functionality
│   └── utils.py              # Command line arguments and helper functions 
│
└── requirements.txt          # Dependencies

Pretraining Representations For Data-Efficient Reinforcement Learning

Related tags

Overview

Pretraining Representations For Data-Efficient Reinforcement Learning

Install

Usage:

What does each file do?

Owner

Mila

High-Resolution Image Synthesis with Latent Diffusion Models

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

PyTorch implementation of Pay Attention to MLPs

Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

Keras Image Embeddings using Contrastive Loss

L-Verse: Bidirectional Generation Between Image and Text

A PyTorch-Based Framework for Deep Learning in Computer Vision

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

Invariant Causal Prediction for Block MDPs

a baseline to practice

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

一套完整的微博舆情分析流程代码，包括微博爬虫、LDA主题分析和情感分析。

Framework web SnakeServer.

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Face Recognize System on camera AI OAK1

A chemical analysis of lipophilicities & molecule drawings including ML

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"