Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Last update: Jan 17, 2022

Related tags

Overview

Policy Gradient Algorithms From Scratch (NumPy)

This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.

Run Instructions

Packages:

numpy and matplotlib

Create virtual environment, install requirements and run: (windows instructions)

Run python -m venv venv
Run .\venv\Scripts\activate (windows)
Run pip install -r requirements.txt
Run python .\experiments.py be wary of long compute times and plots that will pop up and must be exited in order to comtinue.

Some Sample Plots

Files

experiments.py - Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
mdp.py - Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
models.py - Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
policy_gradient_algorithms.py - Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).

MIT License

Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Related tags

Overview

Policy Gradient Algorithms From Scratch (NumPy)

Run Instructions

Packages:

Some Sample Plots

Files

Owner

This repository provides some codes to demonstrate several variants of Markov-Chain-Monte-Carlo (MCMC) Algorithms.

A command line tool for memorizing algorithms in Python by typing them.

Supplementary Data for Evolving Reinforcement Learning Algorithms

This repository is not maintained

Dynamic Programming-Join Optimization Algorithm

With this algorithm you can see all best positions for a Team.

Provide player's names and mmr and generate mathematically balanced teams

Benchmark for Robustness Tests of Control Alrogithms

This project is an implementation of a simple K-means algorithm

This project consists of a collaborative filtering algorithm to predict movie reviews ratings from a dataset of Netflix ratings.

Esse repositório tem como finalidade expor os trabalhos feitos para disciplina de Algoritmos computacionais e estruturais do CEFET-RJ no ano letivo de 2021.

Algorithm and Structured Programming course project for the first semester of the Internet Systems course at IFPB

A Python project for optimizing the 8 Queens Puzzle using the Genetic Algorithm implemented in PyGAD.

This is an Airport Scheduling Time table implemented using Genetic Algorithm

A selection of a few algorithms used to sort or search an array

My own Unicode compression algorithm

Infomap is a network clustering algorithm based on the Map equation.

Algoritmos de busca:

Visualisation for sorting algorithms. Version 2.0

Implementation for Evolution of Strategies for Cooperation