Deep reinforcement learning library built on top of Neural Network Libraries

Overview

License Build status

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]
# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title Notebook Target RL task
Simple reinforcement learning training to get started Open In Colab Pendulum
Learn how to use training algorithms Open In Colab Pendulum
Learn how to use customized network model for training Open In Colab Mountain car
Learn how to use different network solver for training Open In Colab Pendulum
Learn how to use different replay buffer for training Open In Colab Pendulum
Learn how to use your own environment for training Open In Colab Customized environment
Atari game training example Open In Colab Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments
  • Update cem function interface

    Update cem function interface

    Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

    opened by sbsekiguchi 1
  • Add implementation for RNN support and DRQN algorithm

    Add implementation for RNN support and DRQN algorithm

    Add RNN model support and DRQN algorithm.

    Following trainers will support RNN-model.

    • Q value-based trainers
    • Deterministic gradient and Soft policy trainers

    Other trainers can support RNN models in future but is not implemented in the initial release.

    See this paper for the details of the DRQN algorithm.

    opened by ishihara-y 1
  • Implement SACD

    Implement SACD

    This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

    These changes have been made:

    • New environments with factored reward functions have been added
      • FactoredLunarLanderContinuousV2NNablaRL-v1
      • FactoredAntV4NNablaRL-v1
      • FactoredHopperV4NNablaRL-v1
      • FactoredHalfCheetahV4NNablaRL-v1
      • FactoredWalker2dV4NNablaRL-v1
      • FactoredHumanoidV4NNablaRL-v1
    • SACD algorithms has been added
    • SoftQDTrainer has been added
    • _InfluenceMetricsEvaluator has been added
    • reproduction script has been added (not benchmarked yet)

    visualizing influence metrics

    import gym
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    
    evaluation_hook = H.EvaluationHook(
        eval_env,
        EpisodicEvaluator(run_per_evaluation=10),
        timing=5000,
        writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
    )
    iteration_num_hook = H.IterationNumHook(timing=100)
    
    config = A.SACDConfig(gpu_id=0, reward_dimension=9)
    sacd = A.SACD(env, config=config)
    sacd.set_hooks([iteration_num_hook, evaluation_hook])
    sacd.train_online(env, total_iterations=100000)
    
    influence_history = []
    
    state = env.reset()
    while True:
        action = sacd.compute_eval_action(state)
        influence = sacd.compute_influence_metrics(state, action)
        influence_history.append(influence)
        state, _, done, _ = env.step(action)
        if done:
            break
    
    influence_history = np.array(influence_history)
    for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
        plt.plot(influence_history[:, i], label=label)
    plt.xlabel("step")
    plt.ylabel("influence metrics")
    plt.legend()
    plt.show()
    

    image

    sample animation

    sample

    opened by ishihara-y 0
  • Add gmm and Update gaussian

    Add gmm and Update gaussian

    Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

    The API change is like following:

    Previous :

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    distribution = D.Gaussian(mean, ln_var)
    # return nn.Variable
    assert isinstance(distribution.sample(), nn.Variable)
    

    Updated:

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    # You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
    distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
    assert isinstance(distribution.sample(), nn.Variable)
    
    # If you pass np.ndarray, then all class methods return np.ndarray
    # Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
    distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
    assert isinstance(distribution.sample(), np.ndarray)
    
    opened by sbsekiguchi 0
  • Support nnabla-browser

    Support nnabla-browser

    • [x] add MonitorWriter
    • [x] save computational graph as nntxt

    example

    import gym
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    # save training computational graph
    training_graph_hook = H.TrainingGraphHook(outdir="test")
    
    # evaluation hook with nnabla's Monitor
    eval_env = gym.make("Pendulum-v0")
    evaluator = EpisodicEvaluator(run_per_evaluation=10)
    evaluation_hook = H.EvaluationHook(
        eval_env,
        evaluator,
        timing=10,
        writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
    )
    
    env = gym.make("Pendulum-v0")
    sac = A.SAC(env)
    sac.set_hooks([training_graph_hook, evaluation_hook])
    
    sac.train_online(env, total_iterations=100)
    

    image image

    opened by ishihara-y 0
  • Add iLQR and LQR

    Add iLQR and LQR

    Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

    Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

    opened by ishihara-y 0
  • Check np_random instance and use correct randint alternative

    Check np_random instance and use correct randint alternative

    I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

    # in case of RandomState
    # this line works
    gym.unwrapped.np_random.rand_int(...)
    # in case of Generator
    # rand_int does not exist and we must use integers as an alternative
    gym.unwrapped.np_random.integers(...)
    

    This PR will fix this issue and chooses correct function for sampling integers.

    opened by ishihara-y 0
  • Add icra2018 qtopt

    Add icra2018 qtopt

    opened by sbsekiguchi 0
Releases(v0.12.0)
Owner
Sony
Sony Group Corporation
Sony
GTPS Status Bot

Python GTPS Status Bot (BETA) Python GTPS Status Bot Require Python How To Use Download This Source Extract The Zip File Install the requirements (Mod

Lamp 4 Oct 11, 2021
RP2 is a privacy-focused, free, open-source US cryptocurrency tax calculator

Privacy-focused, free, open-source cryptocurrency US tax calculator, up to date for 2021: it handles multiple coins/exchanges and computes long/short-term capital gains, cost bases, in/out lot relati

eprbell 123 Jan 04, 2023
Asynchronous RDP/VNC client for Python (GUI)

🚩 This is the public repository of aardwolf, for latest version and updates please consider supporting us through https://porchetta.industries/ AARDW

29 Dec 15, 2022
A minimal open source mtg-like tcg game made in python that can be played on a terminal emulator using a keyboard.

TCG-TERM Project state: 🔧 🚧 🚧 🚧 Incomplete, In development 🚧 🚧 🚧 👷 (Keep in mind that at the moment, This project is currently undone, and wil

Amos 3 Aug 29, 2021
E0 AI Bot is based on the message, it prints the answer with the highest probability using probability from the database.

E0 AI Chat Bot Based on the message, it prints the answer with the highest probability using probability from the database. Install on linux (Arch,Deb

Error 27 Dec 03, 2022
Discord Online Account Forever

💠 Discord-Online-Account-Forever Discord Online Account Forever 📸 Tutorial Token Discord NEVER SHARE YOUR TOKEN Installation Replit 🧿 Replit : Here

nimaisox 2 Nov 28, 2021
An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

URL Uploader Bot An Open Source ALL-In-One Telegram RoBot, that can do lot of things. My Features Installation The Easy Way You can also tap the Deplo

NT BOTS 1 Oct 23, 2021
GG Dorking is a tool to generate GitHub and Google dorking for pentesters and bug bounty hunters.

GG-Dorking GG Dorking is a python tool to generate GitHub and Google dorking links for pentesters and bug bounty hunters. It will help you to find imp

Eslam Akl 80 Nov 24, 2022
Python API client library for phpIPAM installations

phpypam: Python API client library for phpIPAM installation As we started to develop phpipam-ansible-modules we used an existing python library for ph

codeaffen 3 Aug 03, 2022
Leveraged grid-trading bot using CCXT/CCXT Pro library in FTX exchange.

Leveraged-grid-trading-bot The code is designed to perform infinity grid trading strategy in FTX exchange. The basic trader named Gridtrader.py contro

Hao-Liang Wen 25 Oct 07, 2021
Python Tool To Get The Date That Your Account Joined Instagram

Date-Joined-Insta Python Tool To Get The Date That Your Account Joined Instagram You Dont Need To Login Just Enter The UserName If Id Did Not Work Ins

A B D U L L A H . 1 Dec 21, 2021
Total time of all YouTube videos in a playlist.

Youtube Playlist Total Times Total time of all YouTube videos in a playlist. How to Use Download chromedriver depending on your os and chrome version

Mohammad Dori 3 Jul 15, 2022
A casino discord bot written in Python

Casino Bot Casino bot is a gambling discord bot I made for my friends. It is able to play blackjack, slots, flip a coin, and roll dice. It stores ever

Connor Swislow 27 Dec 30, 2022
Wrapper for Gismeteo.ru.

pygismeteo Обёртка для Gismeteo.ru. Асинхронная версия здесь. Установка python -m pip install -U pygismeteo Документация https://pygismeteo.readthedoc

Almaz 7 Dec 26, 2022
Discord-Mass-Mention - Yup the title says it all

Protocol - Mass Mention (i havent tested this with any token other than my own t

Mallowies 14 Nov 06, 2022
Bot Maker For Discord - Python Edition

BMFD-PE Bot Maker For Discord - Python Edition BMFD-PE is a new version of BMFD write in Python The Version of BMFD-PE is : alpha0.1 Longer support :

Téo 2 Dec 22, 2021
Start multiple bots using one script. VK RAID BOTNET

MultiRaidBotnet Start multiple bots using one script. VK RAID BOTNET Русский launcher.py - главный скрипт, запускающий весь ботнет config.py - в нём х

2 Jul 22, 2022
Telegram Group Manager Bot Written In Python Using Pyrogram.

──「𝐂𝐡𝐢𝐤𝐚 𝐅𝐮𝐣𝐢𝐰𝐚𝐫𝐚」── Telegram Group Manager Bot Written In Python Using Pyrogram. Deploy To Heroku NOTE: I'm making this note to whoever

Wahyusaputra 3 Feb 12, 2022
A basic Ubisoft API wrapper created in python.

UbisoftAPI A basic Ubisoft API wrapper created in python. I will be updating this with more endpoints as time goes on. Please note that this is my fir

Ethan 2 Oct 31, 2021
A tool to customize your discord tokens

Fastest Discord Token Manager - Features: Change Token Username Change Token Password Change Token Avatar Change Token Bio This tool is created by Ace

trey 15 Dec 27, 2022