Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Overview

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

This is a PyTorch implementation of the original Lua code release.

Overview

This codebase implements two approaches to learning discrete communication protocols for playing collaborative games: Reinforced Inter-Agent Learning (RIAL), in which agents learn a factorized deep Q-learning policy across game actions and messages, and Differentiable Inter-Agent Learning (DIAL), in which the message vectors are directly learned by backpropagating errors through a noisy communication channel during training, and discretized to binary vectors during test time. While RIAL and DIAL share the same individual network architecture, one would expect learning to be more efficient under DIAL, which directly backpropagates downstream errors during training, a fact that is verified in comparing the performance of the two approaches.

Execution

$ virtualenv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ python main.py -c config/switch_3_dial.json

Results for switch game

DIAL vs. RIAL reward curves

This chart was generated by plotting an exponentially-weighted average across 20 trials for each curve.

More info

More generally, main.py takes multiple arguments:

Arg Short Description Required?
--config_path -c path to JSON configuration file
--results_path -r path to directory in which to save results per trial (as csv) -
--ntrials -n number of trials to run -
--start_index -s start-index used as suffix in result filenames -
--verbose -v prints results per training epoch to stdout if set -
Configuration

JSON configuration files passed to main.py should consist of the following key-value pairs:

Key Description Type
game name of the game, e.g. "switch" string
game_nagents number of agents int
game_action_space number of valid game actions int
game_comm_limited true if only some agents can communicate at each step bool
game_comm_bits number of bits per message int
game_comm_sigma standard deviation of Gaussian noise applied by DRU float
game_comm_hard true if use hard discretization, soft approximation otherwise bool
nsteps maximum number of game steps int
gamma reward discount factor for Q-learning float
model_dial true if agents should use DIAL bool
model_comm_narrow true if DRU should use sigmoid for regularization, softmax otherwise bool
model_target true if learning should use a target Q-network bool
model_bn true if learning should use batch normalization bool
model_know_share true if agents should share parameters bool
model_action_aware true if each agent should know their last action bool
model_rnn_size dimension of rnn hidden state int
bs batch size of episodes, run in parallel per epoch int
learningrate learning rate for optimizer (RMSProp) float
momentum momentum for optimizer (RMSProp) float
eps exploration rate for epsilon-greedy exploration float
nepisodes number of epochs, each consisting of parallel episodes int
step_test perform a test episode every this many steps int
step_target update target network every this many steps int
Visualizing results

You can use analyze_results.py to graph results output by main.py. This script will plot the average results across all csv files per path specified after -r. Further, -a can take an alpha value to plot results as exponentially-weighted moving averages, and -l takes an optional list of labels corresponding to the paths.

$ python util/analyze_results -r <paths to results> -a <weight for EWMA>

Bibtex

@inproceedings{foerster2016learning,
    title={Learning to communicate with deep multi-agent reinforcement learning},
    author={Foerster, Jakob and Assael, Yannis M and de Freitas, Nando and Whiteson, Shimon},
    booktitle={Advances in Neural Information Processing Systems},
    pages={2137--2145},
    year={2016} 
}

License

Code licensed under the Apache License v2.0

Owner
Minqi
Minqi
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.

Microsoft 7.6k Jan 01, 2023
(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

yuta-saito 19 Dec 01, 2022
Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir.

NetScanner.py Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir. Linux'da Kullanımı: git clone https://github.com/

4 Aug 23, 2021
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Aviv Gabbay 41 Nov 29, 2022
A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

Improved Adversarial Systems for 3D Object Generation and Reconstruction: This is a repository for the paper "Improved Adversarial Systems for 3D Obje

Edward Smith 188 Dec 25, 2022
Linear Variational State Space Filters

Linear Variational State Space Filters To set up the environment, use the provided scripts in the docker/ folder to build and run the codebase inside

0 Dec 13, 2021
Qlib is an AI-oriented quantitative investment platform

Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.

Microsoft 10.1k Dec 30, 2022
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

119 Dec 04, 2022
Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices

Face-Mesh Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices. It employs machine learning

Farnam Javadi 9 Dec 21, 2022
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Pedestron Pedestron is a MMdetection based repository, that focuses on the advancement of research on pedestrian detection. We provide a list of detec

Irtiza Hasan 594 Jan 05, 2023
Unsupervised Discovery of Object Radiance Fields

Unsupervised Discovery of Object Radiance Fields by Hong-Xing Yu, Leonidas J. Guibas and Jiajun Wu from Stanford University. arXiv link: https://arxiv

Hong-Xing Yu 148 Nov 30, 2022
Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning Official PyTorch implementation of "Object-aware Contrastive Learning for Debiased Scene Representation" by Sangwoo

43 Dec 14, 2022
Implementing DropPath/StochasticDepth in PyTorch

%load_ext memory_profiler Implementing Stochastic Depth/Drop Path In PyTorch DropPath is available on glasses my computer vision library! Introduction

Francesco Saverio Zuppichini 13 Jan 05, 2023
code associated with ACL 2021 DExperts paper

DExperts Hi! This repository contains code for the paper DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts to appear at

Alisa Liu 68 Dec 15, 2022
Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image (Project page) Zhengqin Li, Mohammad Sha

209 Jan 05, 2023
constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

CS Metrics 13 Jun 18, 2022
An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Raspberry Pi Air Quality Monitor A simple air quality monitoring service for the Raspberry Pi. Installation Clone the repository and run the following

rydercalmdown 24 Dec 09, 2022
Toolkit for collecting and applying prompts

PromptSource Promptsource is a toolkit for collecting and applying prompts to NLP datasets. Promptsource uses a simple templating language to programa

BigScience Workshop 998 Jan 03, 2023
Object recognition using Azure Custom Vision AI and Azure Functions

Step by Step on how to create an object recognition model using Custom Vision, export the model and run the model in an Azure Function

El Bruno 11 Jul 08, 2022