Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

Overview

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Downloads Downloads Python 3.6 PyPI



ElegantRL is developed for researchers and practitioners with the following advantages:

  • Lightweight: the core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).

  • Efficient: in many testing cases, we find it more efficient than Ray RLlib.

  • Stable: much more stable than [Stable Baselines 3] (https://github.com/DLR-RM/stable-baselines3). Stable Baselines 3 can only use single GPU, but ElegantRL can use 1~8 GPUs for stable training.

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

  • DDPG, TD3, SAC, PPO, PPO (GAE),REDQ for continuous actions
  • DQN, DoubleDQN, D3QN, SAC for discrete actions
  • QMIX, VDN; MADDPG, MAPPO, MATD3 for multi-agent environment

For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

《诗经·小雅·鹤鸣》中「他山之石,可以攻玉」,是我们的库“小雅”名字的来源。

Contents

News

Framework (Helloworld folder)

File_structure

An agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

A high-level overview:

  • 1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
  • 2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
  • 3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
  • 4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.

Code Structure

Core Codes

  • elegantrl/agents/net.py         # Neural networks.
    • Q-Net,
    • Actor network,
    • Critic network,
  • elegantrl/agents/Agent___.py   # RL algorithms.
    • AgentBase,
  • elegantrl/train/run___.py       # run DEMO 1 ~ 4
    • Parameter initialization,
    • Training loop,
    • Evaluator.

Until Codes

  • elegantrl/envs/      # gym env or custom env, including FinanceStockEnv.
    • gym_utils.py: A PreprocessEnv class for gym-environment modification.
    • Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
  • eRL_demo_BipedalWalker.ipynb        # BipedalWalker-v2 in jupyter notebooks
  • eRL_demos.ipynb      # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
  • eRL_demo_SingleFilePPO.py      # Use a single file to train PPO, more simple than tutorial version
  • eRL_demo_StockTrading.py      # Stock Trading Application in jupyter notebooks

Start to Train

Initialization:

  • hyper-parameters args.
  • env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
  • agent = agent.XXX() : creates an agent for a DRL algorithm.
  • buffer = ReplayBuffer() : stores the transitions.
  • evaluator = Evaluator() : evaluates and stores the trained model.

Training (a while-loop):

  • agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
  • agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
  • evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experiments

Experimental Demos

LunarLanderContinuous-v2

LunarLanderTwinDelay3

BipedalWalkerHardcore-v2

Note: BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward. Check out an experiment video: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. 

pip3 install gym==0.17.0 pybullet Box2D matplotlib

To install StarCraftII env,
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt

Citation:

To cite this repository:

@misc{erl,
  author = {Liu, Xiao-Yang and Li, Zechu and Wang, Zhaoran and Zheng, Jiahao},
  title = {{ElegantRL}: A Scalable and Elastic Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}
Comments
  • Bump py from 1.6.0 to 1.10.0 in /elegantrl/envs/SMAC

    Bump py from 1.6.0 to 1.10.0 in /elegantrl/envs/SMAC

    Bumps py from 1.6.0 to 1.10.0.

    Changelog

    Sourced from py's changelog.

    1.10.0 (2020-12-12)

    • Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
    • Update vendored apipkg: 1.4 => 1.5
    • Update vendored iniconfig: 1.0.0 => 1.1.1

    1.9.0 (2020-06-24)

    • Add type annotation stubs for the following modules:

      • py.error
      • py.iniconfig
      • py.path (not including SVN paths)
      • py.io
      • py.xml

      There are no plans to type other modules at this time.

      The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

    1.8.2 (2020-06-15)

    • On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

    1.8.1 (2019-12-27)

    • Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

    • py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

    1.8.0 (2019-02-21)

    • add "importlib" pyimport mode for python3.5+, allowing unimportable test suites to contain identically named modules.

    • fix LocalPath.as_cwd() not calling os.chdir() with None, when being invoked from a non-existing directory.

    ... (truncated)

    Commits
    • e5ff378 Update CHANGELOG for 1.10.0
    • 94cf44f Update vendored libs
    • 5e8ded5 testing: comment out an assert which fails on Python 3.9 for now
    • afdffcc Rename HOWTORELEASE.rst to RELEASING.rst
    • 2de53a6 Merge pull request #266 from nicoddemus/gh-actions
    • fa1b32e Merge pull request #264 from hugovk/patch-2
    • 887d6b8 Skip test_samefile_symlink on pypy3 on Windows
    • e94e670 Fix test_comments() in test_source
    • fef9a32 Adapt test
    • 4a694b0 Add GitHub Actions badge to README
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    opened by dependabot[bot] 11
  • Bump pyyaml from 3.13 to 5.4 in /elegantrl/envs/SMAC

    Bump pyyaml from 3.13 to 5.4 in /elegantrl/envs/SMAC

    Bumps pyyaml from 3.13 to 5.4.

    Changelog

    Sourced from pyyaml's changelog.

    5.4 (2021-01-19)

    5.3.1 (2020-03-18)

    • yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

    5.3 (2020-01-06)

    5.2 (2019-12-02)

    • Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default yaml/pyyaml#279 -- A more flexible fix for custom tag constructors yaml/pyyaml#287 -- Change default loader for yaml.add_constructor yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
    • Make FullLoader safer by removing python/object/apply from the default FullLoader yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
    • Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff yaml/pyyaml#276 -- Fix logic for quoting special characters
    • Other PRs: yaml/pyyaml#280 -- Update CHANGES for 5.1

    5.1.2 (2019-07-30)

    • Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+

    ... (truncated)

    Commits
    • 58d0cb7 5.4 release
    • a60f7a1 Fix compatibility with Jython
    • ee98abd Run CI on PR base branch changes
    • ddf2033 constructor.timezone: _copy & deepcopy
    • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
    • a001f27 Fix for CVE-2020-14343
    • fe15062 Add 3.9 to appveyor file for completeness sake
    • 1e1c7fb Add a newline character to end of pyproject.toml
    • 0b6b7d6 Start sentences and phrases for capital letters
    • c976915 Shell code improvements
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    opened by dependabot[bot] 11
  • Bump py from 1.6.0 to 1.10.0 in /elegantrl/elegantrl/envs/starcraft

    Bump py from 1.6.0 to 1.10.0 in /elegantrl/elegantrl/envs/starcraft

    Bumps py from 1.6.0 to 1.10.0.

    Changelog

    Sourced from py's changelog.

    1.10.0 (2020-12-12)

    • Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
    • Update vendored apipkg: 1.4 => 1.5
    • Update vendored iniconfig: 1.0.0 => 1.1.1

    1.9.0 (2020-06-24)

    • Add type annotation stubs for the following modules:

      • py.error
      • py.iniconfig
      • py.path (not including SVN paths)
      • py.io
      • py.xml

      There are no plans to type other modules at this time.

      The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

    1.8.2 (2020-06-15)

    • On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

    1.8.1 (2019-12-27)

    • Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

    • py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

    1.8.0 (2019-02-21)

    • add "importlib" pyimport mode for python3.5+, allowing unimportable test suites to contain identically named modules.

    • fix LocalPath.as_cwd() not calling os.chdir() with None, when being invoked from a non-existing directory.

    ... (truncated)

    Commits
    • e5ff378 Update CHANGELOG for 1.10.0
    • 94cf44f Update vendored libs
    • 5e8ded5 testing: comment out an assert which fails on Python 3.9 for now
    • afdffcc Rename HOWTORELEASE.rst to RELEASING.rst
    • 2de53a6 Merge pull request #266 from nicoddemus/gh-actions
    • fa1b32e Merge pull request #264 from hugovk/patch-2
    • 887d6b8 Skip test_samefile_symlink on pypy3 on Windows
    • e94e670 Fix test_comments() in test_source
    • fef9a32 Adapt test
    • 4a694b0 Add GitHub Actions badge to README
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    opened by dependabot[bot] 10
  • Bump pyyaml from 3.13 to 5.4 in /elegantrl/elegantrl/envs/starcraft

    Bump pyyaml from 3.13 to 5.4 in /elegantrl/elegantrl/envs/starcraft

    Bumps pyyaml from 3.13 to 5.4.

    Changelog

    Sourced from pyyaml's changelog.

    5.4 (2021-01-19)

    5.3.1 (2020-03-18)

    • yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

    5.3 (2020-01-06)

    5.2 (2019-12-02)

    • Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default yaml/pyyaml#279 -- A more flexible fix for custom tag constructors yaml/pyyaml#287 -- Change default loader for yaml.add_constructor yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
    • Make FullLoader safer by removing python/object/apply from the default FullLoader yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
    • Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff yaml/pyyaml#276 -- Fix logic for quoting special characters
    • Other PRs: yaml/pyyaml#280 -- Update CHANGES for 5.1

    5.1.2 (2019-07-30)

    • Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+

    ... (truncated)

    Commits
    • 58d0cb7 5.4 release
    • a60f7a1 Fix compatibility with Jython
    • ee98abd Run CI on PR base branch changes
    • ddf2033 constructor.timezone: _copy & deepcopy
    • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
    • a001f27 Fix for CVE-2020-14343
    • fe15062 Add 3.9 to appveyor file for completeness sake
    • 1e1c7fb Add a newline character to end of pyproject.toml
    • 0b6b7d6 Start sentences and phrases for capital letters
    • c976915 Shell code improvements
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    opened by dependabot[bot] 10
  • Unable to train on many agents

    Unable to train on many agents

    Whenever I try to train on agents, I consistently get the error:

    AttributeError: type object 'AgentXYZ' has no attribute 'if_off_policy'
    

    For example, here is the error for AgentDQN:

    Traceback (most recent call last):
      File "/home/momin/Documents/GitHub/ElegantRL/tests/test_training_agents.py", line 50, in test_should_create_arguments_for_each_agent
        Arguments(agent, env_func=gym.make, env_args=self.discrete_env_args)
      File "/home/momin/Documents/GitHub/ElegantRL/elegantrl/train/config.py", line 140, in __init__
        self.if_off_policy = agent.if_off_policy  # agent is on-policy or off-policy
    AttributeError: type object 'AgentDQN' has no attribute 'if_off_policy'
    

    I can confirm that this error affects the following agents: AgentDQN AgentD3QN AgentDDPG AgentDiscretePPO AgentDoubleDQN AgentDuelingDQN AgentModSAC AgentPPO_H AgentPPO AgentSAC_H AgentSAC AgentTD3

    @shixun404

    bug 
    opened by hmomin 8
  • Cannot find reference 'ActorMAPPO' in 'net.py'

    Cannot find reference 'ActorMAPPO' in 'net.py'

    Hi I want to use this library for Multiagent RL, in AgentMAPPO.py file there are two undefined references from elegantrl.agents.net import ActorMAPPO, CriticMAPPO ActorMAPPO and CriticMAPPO how can I fix can this?

    bug 
    opened by josyulakrishna 7
  • several issues found in recent update

    several issues found in recent update

    1. in train/config.py it calls function self.get_if_off_policy(), but actually the function name is if_off_policy()
    2. in train/config.py it calls self.agent_class.name , but 'Arguments' object has no attribute 'agent_class'
    3. in train/run.py it calls args.agent_class(), but actually these is no agent_class in Arguments. similar issue to above
    4. in train/run.py it calls args.max_memo, error message: 'Arguments' object has no attribute 'max_memo'
    5. in train/run.py ti calls args.eval_env_func, error message: 'Arguments' object has no attribute 'eval_env_func' 6...
    bug good first issue 
    opened by richardhuo 6
  • SAC : why actor has a target network? Why ModSAC has a Reliable lamdba and TTUR?

    SAC : why actor has a target network? Why ModSAC has a Reliable lamdba and TTUR?

    你好 我看到在代码中,sac的actor也有target_net。这个在其他implementation,比如stable_baseline3, spinning_up都没有出现。 Spinning Up: SAC中也有强调,

    Unlike in TD3, the next-state actions used in the target come from the current policy instead of a target policy.

    请问下,加上target network是为了得到更稳定的actor吗?

    dicussion 
    opened by wsgdrfz 6
  • Fix dead links to `elegantrl_helloworld`

    Fix dead links to `elegantrl_helloworld`

    The links to ElegantRL/helloworld in the "Hello World" section at the latest documentation (https://elegantrl.readthedocs.io/en/latest/helloworld/intro.html) are broken. I believe it was renamed in the repo but the change wasn't reflected in the docs. This fixes the broken links to point to the new remote url https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/helloworld.

    (Other docs pages that reference ElegantRL/helloworld/ might still be broken too (!!) )

    opened by Siraj-Qazi 5
  • Fail to run tutorial_Isaac_Gym.py

    Fail to run tutorial_Isaac_Gym.py

    Hello! Thank you for creating this brilliant library! This is so helpful on a personal project I am working on. I faced an error when trying to run tutorial_Isaac_Gym.py in the example folder:

    Traceback (most recent call last):
      File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
        self.run()
      File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/home/meow/ElegantRL/elegantrl/train/run.py", line 162, in run
        env = build_env(args.env, args.env_func, args.env_args)
      File "/home/meow/ElegantRL/elegantrl/train/config.py", line 249, in build_env
        env = env_func(**kwargs_filter(env_func.__init__, env_args.copy()))
      File "/home/meow/ElegantRL/elegantrl/envs/IsaacGym.py", line 45, in __init__
        env: VecTask = isaac_task(
      File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 69, in __init__
        super().__init__(
      File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/base/vec_task.py", line 213, in __init__
        self.create_sim()
      File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 156, in create_sim
        self._create_envs(
      File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 199, in _create_envs
        self.joint_gears = to_torch(motor_efforts, device=self.device)
      File "/home/meow/Downloads/IsaacGym_Preview_3_Package/isaacgym/python/isaacgym/torch_utils.py", line 16, in to_torch
        return torch.tensor(x, dtype=dtype, device=device, requires_grad=requires_grad)
      File "/home/meow/anaconda3/envs/igym/lib/python3.8/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
        torch._C._cuda_init()
    RuntimeError: CUDA error: out of memory
    

    I'm running this on NVIDIA RTX3070TI with 8GB VRAM, and my CUDA version is:

    $ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Mon_Nov_30_19:08:53_PST_2020
    Cuda compilation tools, release 11.2, V11.2.67
    Build cuda_11.2.r11.2/compiler.29373293_0
    

    The same Ant(with 2048env) example was working when I test it using the original isaac gym train.py. I'm pretty sure that I have free VRAM (~7.2GB) when running this but it still appears the CUDA out of memory error. My torch version is 1.11.0.

    I have also tried to reduce the number of envs, batch size, network size and other parameters, but the error remains.

    Once again thank you so much for any possible help on this issue

    bug 
    opened by planetbalileua 5
  • TypeError: __init__() missing 1 required positional argument: 'action_dim'

    TypeError: __init__() missing 1 required positional argument: 'action_dim'

    If you run your own example, errors will be reported. Please help us to find out what the problem is?

    /opt/anaconda3/envs/elegant_RL/bin/python /home/lhs/PycharmProjects/elegant_RL/tutorial_BipedalWalker-v3.py WARNING: env.action_space.high [1. 1. 1. 1.] env_args = { 'env_num': 1, 'env_name': 'BipedalWalker-v3', 'max_step': 1600, 'state_dim': 24, 'action_dim': 4, 'if_discrete': False, 'target_return': 300, } | Arguments Remove cwd: ./BipedalWalker-v3_PPO_0 Traceback (most recent call last): File "/home/lhs/PycharmProjects/elegant_RL/tutorial_BipedalWalker-v3.py", line 32, in train_and_evaluate(args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/train/run.py", line 87, in train_and_evaluate agent = init_agent(args, gpu_id, env) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/train/run.py", line 16, in init_agent agent = args.agent_class(args.net_dim, args.state_dim, args.action_dim, gpu_id=gpu_id, args=args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/agents/AgentPPO.py", line 38, in init AgentBase.init(self, net_dim, state_dim, action_dim, gpu_id, args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/agents/AgentBase.py", line 53, in init self.act = act_class(net_dim, state_dim, action_dim).to(self.device) TypeError: init() missing 1 required positional argument: 'action_dim'

    bug 
    opened by haisheng666 4
  • ✨  build vectorized env from single env

    ✨ build vectorized env from single env

    The subprocess Vectorized environment of stable baselines 3 is useful.

    So I add a simplify version of subprocess Vectorized environment for ElegantRL.

    Demo code: Add the num_envs=int and if_build_vec_env=True to the env_args for a single env. Then function build_env() will build a vectorized env automatically. https://github.com/AI4Finance-Foundation/ElegantRL/blob/003572d9a37a1a302edb8bebd4fbd9c17d5ddccd/examples/demo_A2C_PPO.py#L62-L72

    Function build_env() build the vectorized env from a single env using class VecEnv: Build vectorized env from single env: https://github.com/AI4Finance-Foundation/ElegantRL/blob/003572d9a37a1a302edb8bebd4fbd9c17d5ddccd/elegantrl/train/config.py#L112-L124

    The vectorized env class VecEnv is a simplify version of subprocess Vectorized environment for ElegantRL. VecEnv build a vectorized env on GPU. The sub-env number is num_envs.

    https://github.com/AI4Finance-Foundation/ElegantRL/blob/003572d9a37a1a302edb8bebd4fbd9c17d5ddccd/elegantrl/train/config.py#L245-L248

    VecEnv use multiprocessing.Pipe to communicate with SubEnv: https://github.com/AI4Finance-Foundation/ElegantRL/blob/003572d9a37a1a302edb8bebd4fbd9c17d5ddccd/elegantrl/train/config.py#L267-L271

    bug 
    opened by Yonv1943 0
  • ActorFixSAC or AgentBase的__init__有bug?

    ActorFixSAC or AgentBase的__init__有bug?

    1. 运行 tutorial_LunarLanderContinuous_v2.ipynb 报错:
    2. ElegantRL\elegantrl\agents\AgentBase.py:58, in AgentBase.init(self, net_dim, state_dim, action_dim, gpu_id, args) 56 cri_class = getattr(self, "cri_class", None) 57 print(act_class) ---> 58 self.act = act_class(net_dim, state_dim, action_dim).to(self.device) 59 self.cri = cri_class(net_dim, state_dim, action_dim).to(self.device)
      60 if cri_class else self.act 62 '''optimizer'''

    TypeError: init() missing 1 required positional argument: 'action_dim' 3. 看了一下,调用的是 net.py ActorFixSAC 的def init(self, mid_dim, num_layer, state_dim, action_dim): 4. 多了一个参数num_layer, 而且其它地方也没有用, 应该要把这个里删除吧。 5. 我修改AgentBase 的init 总算成功运行了:self.act = act_class(net_dim, self.num_layer, state_dim, action_dim).to(self.device) self.cri = cri_class(net_dim, self.num_layer, state_dim, action_dim).to(self.device)

    bug 
    opened by flhang 1
  • Issue with MADDPG and MATD3

    Issue with MADDPG and MATD3

    Hi! I am trying to use ElegantRL for multi-agent RL training as it seems very well written.

    I tried to use MADDPG or MATD3. But none of these agents seem to be runnable. For example, the construction method for AgentDDPG requires arguments: https://github.com/AI4Finance-Foundation/ElegantRL/blob/b447f3a04993e0ab8fc11017c1b20c6d560f493b/elegantrl/agents/AgentDDPG.py#L29

    But the MADDPG or MATD3 implementation doesn't provide that. https://github.com/AI4Finance-Foundation/ElegantRL/blob/b447f3a04993e0ab8fc11017c1b20c6d560f493b/elegantrl/agents/AgentMADDPG.py#L43

    There are also other places that don't seem to be compatible.

    I wonder if this is a problem with the multi-agent implementations using a legacy version of the codebase. And is it possible to provide a minimal working demo for MADDPG or MATD3?

    Thanks a lot!!

    bug 
    opened by Gabr1e1 2
  • A policy update bug in AgentPPO?

    A policy update bug in AgentPPO?

    The following codes show that the policy used to explore the env (generate the action and logprob) is 'self.act',

    get_action = self.act.get_action 
    convert = self.act.convert_action_for_env
    for i in range(horizon_len):
            state = torch.as_tensor(ary_state, dtype=torch.float32, device=self.device)
            action, logprob = [t.squeeze() for t in get_action(state.unsqueeze(0))]
    

    while in the update function, the actions and policy used to calculate the 'new_log_prob' are exactly the same as the ones above:

    new_logprob, obj_entropy = self.act.get_logprob_entropy(state, action)
    ratio = (new_logprob - logprob.detach()).exp()
    

    I think that 'ratio' will be always 1. Is it a bug or there is something I misunderstand?

    dicussion 
    opened by huge123 1
  • Fix the AgentBase.__init__ () for all the DRL algorithms in folder /elegantrl/agents

    Fix the AgentBase.__init__ () for all the DRL algorithms in folder /elegantrl/agents

        In the commit we still have
    

    self.act = act_class(net_dim, state_dim, action_dim).to(self.device) self.cri = cri_class(net_dim, state_dim, action_dim).to(self.device)
    if cri_class else self.act the example still crashes for me

    Originally posted by @JonathanLehner in https://github.com/AI4Finance-Foundation/ElegantRL/issues/239#issuecomment-1352250265

    refactoring 
    opened by Yonv1943 1
Releases(v0.3.5)
Owner
AI4Finance Foundation
An open-source community sharing AI tools for finance.
AI4Finance Foundation
RepVGG: Making VGG-style ConvNets Great Again

This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge,the paper is RepVGG: Making VGG-style ConvNets Great Again

Ty Feng 62 May 21, 2022
Implementation of ConvMixer in TensorFlow and Keras

ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on

Sayan Nath 8 Oct 03, 2022
Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression"

beyond-preserved-accuracy Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression" How to implemen

Kevin Canwen Xu 10 Dec 23, 2022
Empower Sequence Labeling with Task-Aware Language Model

LM-LSTM-CRF Check Our New NER Toolkit 🚀 🚀 🚀 Inference: LightNER: inference w. models pre-trained / trained w. any following tools, efficiently. Tra

Liyuan Liu 838 Jan 05, 2023
Repo 4 basic seminar §How to make human machine readable"

WORK IN PROGRESS... Notebooks from the Seminar: Human Machine Readable WS21/22 Introduction into programming Georg Trogemann, Christian Heck, Mattis

experimental-informatics 3 May 29, 2022
Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

CRL_EGPG Pytorch Implementation of Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation We use contrastive loss implemented b

YHR 25 Nov 14, 2022
CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

Lixin YANG 99 Dec 26, 2022
A simple AI that will give you si ple task and this is made with python

Crystal-AI A simple AI that will give you si ple task and this is made with python Prerequsites: Python3.6.2 pyttsx3 pip install pyttsx3 pyaudio pip i

CrystalAnd 1 Dec 25, 2021
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022
Multi-Modal Fingerprint Presentation Attack Detection: Evaluation On A New Dataset

PADISI USC Dataset This repository analyzes the PADISI-Finger dataset introduced in Multi-Modal Fingerprint Presentation Attack Detection: Evaluation

USC ISI VISTA Computer Vision 6 Feb 06, 2022
An onlinel learning to rank python codebase.

OLTR Online learning to rank python codebase. The code related to Pairwise Differentiable Gradient Descent (ranker/PDGDLinearRanker.py) is copied from

ielab 5 Jul 18, 2022
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

Gi-Cheon Kang 9 Jul 05, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Minimal fastai code needed for working with pytorch

fastai_minima A mimal version of fastai with the barebones needed to work with Pytorch #all_slow Install pip install fastai_minima How to use This lib

Zachary Mueller 14 Oct 21, 2022
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

PySDM PySDM is a package for simulating the dynamics of population of particles. It is intended to serve as a building block for simulation systems mo

Atmospheric Cloud Simulation Group @ Jagiellonian University 32 Oct 18, 2022
Implementation of Sequence Generative Adversarial Nets with Policy Gradient

SeqGAN Requirements: Tensorflow r1.0.1 Python 2.7 CUDA 7.5+ (For GPU) Introduction Apply Generative Adversarial Nets to generating sequences of discre

Lantao Yu 2k Dec 29, 2022
Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

MedAI: Transparency in Medical Image Segmentation What is this repo This repo contains the code and experiments that are implemented to contribute in

Awadelrahman M. A. Ahmed 1 Nov 22, 2021
Code for Emergent Translation in Multi-Agent Communication

Emergent Translation in Multi-Agent Communication PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Comm

Facebook Research 75 Jul 15, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 106 Dec 29, 2022