Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Overview

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Avalanche Website | Getting Started | Examples | Tutorial | API Doc | Paper | Twitter

unit test syntax checking PEP8 checking docstring coverage Coverage Status

Avalanche RL is a fork of ContinualAI's Pytorch-based framework Avalanche with the goal of extending its capabilities to Continual Reinforcement Learning (CRL), bootstrapping from the work done on Super/Unsupervised Continual Learning.

It should support all environments sharing the gym.Env interface, handle stream of experiences, provide strategies for RL algorithms and enable fast prototyping through an extremely flexible and customizable API.

The core structure and design principles of Avalanche are to remain untouched to easen the learning curve for all continual learning practitioners, so we still work with the same modules you can find in avl:

  • Benchmarks for managing data and stream of data.
  • Training for model training making use of extensible strategies.
  • Evaluation to evaluate the agent on consistent metrics.
  • Extras for general utils and building blocks.
  • Models contains commonly used model architectures.
  • Logging for logging metrics during training/evaluation.

Head over to Avalanche Website to learn more if these concepts sound unfamiliar to you!

Features


Features added so far in this fork can be summarized and grouped by module.

Benchmarks

RLScenario introduces a Benchmark for RL which augments each experience with an 'Environment' (defined through OpenAI gym.Env interface) effectively implementing a "stream of environments" with which the agent can interact to generate data and learn from that interaction during each experience. This concept models the way experiences in the supervised CL context are translated to CRL, moving away from the concept of Dataset toward a dynamic interaction through which data is generated.

RL Benchmark Generators allow to build these streams of experiences seamlessly, supporting:

  • Any sequence of gym.Env environments through gym_benchmark_generator, which returns a RLScenario from a list of environments ids (e.g. ["CartPole-v1", "MountainCar-v0", ..]) with access to a train and test stream just like in Avalanche. It also supports sampling a random number of environments if you wanna get wild with your experiments.
  • Atari 2600 games through atari_benchmark_generator, taking care of common Wrappers (e.g. frame stacking) for these environments to get you started even more quickly.
  • Habitat, more on this later.

Training

RLBaseStrategy is the super-class of all RL algorithms, augmenting BaseStrategy with RL specific callbacks while still making use of all major features such as plugins, logging and callbacks. Inspired by the amazing stable-baselines-3, it supports both on and off-policy algorithms under a common API defined as a 'rollouts phase' (data gathering) followed by an 'update phase', whose specifics are implemented by subclasses (RL algorithms).

Algorithms are added to the framework by subclassing RLBaseStrategy and implementing specific callbacks. You can check out this implementation of A2C in under 50 lines of actual code including the update step and the action sampling mechanism. Currently only A2C and DQN+DoubleDQN algorithms have been implemented, including various other "utils" such as Replay Buffer.

Training with multiple agent is supported through VectorizedEnv, leveraging Ray for parallel and potentially distributed execution of multiple environment interactions.

Evaluation

New metrics have been added to keep track of rewards, episodes length and any kind of scalar value (such as Epsilon Greedy 'eps') during experiments. Metrics are kept track of using a moving averaged window, useful for smoothing out fluctuations and recording standard deviation and max values reached.

Extras

Several common environment Wrappers are also kept here as we encourage the use of this pattern to suit environments output to your needs. We also provide common gym control environments which have been "parametrized" so you can tweak values such as force and gravity to help out in testing new ideas in a fast and reliable way on well known testbeds. These environments are available by pre-pending a C to the env id as in CCartPole-v1 as they're registered on first import.

Models

In this module you can find an implementation of both MLPs and CNNs for deep-q learning and actor-critic approaches, adapted from popular papers such as "Human-level Control Through Deep Reinforcement Learning" and "Overcoming catastrophic forgetting in neural networks" to learn directly from pixels or states.

Logging

A Tqdm-based interactive logger has been added to ease readability as well as sensible default loggers for RL algorithms.

Quick Example


import torch
from torch.optim import Adam
from avalanche.benchmarks.generators.rl_benchmark_generators import gym_benchmark_generator

from avalanche.models.actor_critic import ActorCriticMLP
from avalanche.training.strategies.reinforcement_learning import A2CStrategy

# Config
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Model
model = ActorCriticMLP(num_inputs=4, num_actions=2, actor_hidden_sizes=1024, critic_hidden_sizes=1024)

# CRL Benchmark Creation
scenario = gym_benchmark_generator(['CartPole-v1'], n_experiences=1, n_parallel_envs=1, 
    eval_envs=['CartPole-v1'])

# Prepare for training & testing
optimizer = Adam(model.parameters(), lr=1e-4)

# Reinforcement Learning strategy
strategy = A2CStrategy(model, optimizer, per_experience_steps=10000, max_steps_per_rollout=5, 
    device=device, eval_every=1000, eval_episodes=10)

# train and test loop
results = []
for experience in scenario.train_stream:
    strategy.train(experience)
    results.append(strategy.eval(scenario.test_stream))

Compare it with vanilla Avalanche snippet!

Check out more examples here (advanced ones coming soon) or in unit tests. We also got a small-scale reproduction of the original EWC paper (Deepmind) experiments.

Installation


As this fork is still under development, the advised way to install it is to simply clone this repo git clone https://github.com/NickLucche/avalanche.git and then just follow avalanche guide to install as developer. Spoiler, just run conda env update --file environment-dev.yml to update your current environment with avalanche-rl dependencies. Currently, the only added dependency is ray.

Disclaimer

This fork is under strict development so expect changes on the main branch on a fairly regular basis. As Avalanche itself it's still in its early Alpha versions, it's only fair to say that Avalanche RL is in super-duper pre-Alpha.

We believe there's lots of room for improvements and tweaking but at the same time there's much that can be offered to the growing community of continual learning practitioners approaching reinforcement learning by allowing to perform experiments under a common framework with a well-defined structure.

Owner
ContinualAI
A non-profit research organization and open community on Continual Learning for AI.
ContinualAI
Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling Code for the paper: Greg Ver Steeg and Aram Galstyan. "Hamiltonian Dynamics with N

Greg Ver Steeg 25 Mar 14, 2022
CMSC320 - Introduction to Data Science - Fall 2021

CMSC320 - Introduction to Data Science - Fall 2021 Instructors: Elias Jonatan Gonzalez and José Manuel Calderón Trilla Lectures: MW 3:30-4:45 & 5:00-6

Introduction to Data Science 6 Sep 12, 2022
For IBM Quantum Challenge 2021 (May 20 - 26)

IBM Quantum Challenge 2021 Introduction Commemorating the 40-year anniversary of the Physics of Computation conference, and 5-year anniversary of IBM

Qiskit Community 140 Jan 01, 2023
A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

MatConvNet implementation of the FCN models for semantic segmentation This package contains an implementation of the FCN models (training and evaluati

VLFeat.org 175 Feb 18, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

UltraGCN This is our Pytorch implementation for our CIKM 2021 paper: Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He. UltraGCN: A

XUEPAI 93 Jan 03, 2023
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 02, 2023
Repository for publicly available deep learning models developed in Rosetta community

trRosetta2 This package contains deep learning models and related scripts used by Baker group in CASP14. Installation Linux/Mac clone the package git

81 Dec 29, 2022
Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Label-Efficient Semantic Segmentation with Diffusion Models Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion

Yandex Research 355 Jan 06, 2023
🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗 This year's first semester Club Info challenge will put you at the head of a car racing

ClubINFO INGI (UCLouvain) 6 Dec 10, 2021
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
Some methods for comparing network representations in deep learning and neuroscience.

Generalized Shape Metrics on Neural Representations In neuroscience and in deep learning, quantifying the (dis)similarity of neural representations ac

Alex Williams 45 Dec 27, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

33 Jun 27, 2021
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Vector AI is a framework designed to make the process of building production grade vector based applications as quickly and easily as possible. Create

Vector AI 267 Dec 23, 2022