A parallel framework for population-based multi-agent reinforcement learning.

Last update: Jan 08, 2023

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}

A parallel framework for population-based multi-agent reinforcement learning.

Related tags

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

Installation

Quick Start

Citing MALib

Owner

MARL @ SJTU

Reimplementation of Learning Mesh-based Simulation With Graph Networks

Patches desktop steam to look like the new steamdeck ui.

Buffon’s needle: one of the oldest problems in geometric probability

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

Toolchain to build Yoshi's Island from source code

Using pretrained GROVER to extract the atomic fingerprints from molecule

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

Covid-19 Test AI (Deep Learning - NNs) Software. Accuracy is the %96.5, loss is the 0.09 :)

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

Husein pet projects in here!

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Solving SMPL/MANO parameters from keypoint coordinates.

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

A unified 3D Transformer Pipeline for visual synthesis

The official implementation of Theme Transformer

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild