An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.

Welcome to django-allauth! Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (soc

Raymond Penners 7.7k Jan 03, 2023
Object Moderation Layer

django-oml Welcome to the documentation for django-oml! OML means Object Moderation Layer, the idea is to have a mixin model that allows you to modera

Angel Velásquez 12 Aug 22, 2019
Imia is an authentication library for Starlette and FastAPI (python 3.8+).

Imia Imia (belarussian for "a name") is an authentication library for Starlette and FastAPI (python 3.8+). Production status The library is considered

Alex Oleshkevich 91 Nov 24, 2022
Todo app with authentication system.

todo list web app with authentication system. User can register, login, logout. User can login and create, delete, update task Home Page here you will

Anurag verma 3 Aug 18, 2022
Foundation Auth Proxy is an abstraction on Foundations' authentication layer and is used to authenticate requests to Atlas's REST API.

foundations-auth-proxy Setup By default the server runs on http://0.0.0.0:5558. This can be changed via the arguments. Arguments: '-H' or '--host': ho

Dessa - Open Source 2 Jul 03, 2020
Python module for generating and verifying JSON Web Tokens

python-jwt Module for generating and verifying JSON Web Tokens. Note: From version 2.0.1 the namespace has changed from jwt to python_jwt, in order to

David Halls 210 Dec 24, 2022
A fully tested, abstract interface to creating OAuth clients and servers.

Note: This library implements OAuth 1.0 and not OAuth 2.0. Overview python-oauth2 is a python oauth library fully compatible with python versions: 2.6

Joe Stump 3k Jan 02, 2023
A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

Aman Raj 5 May 10, 2022
python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

1k Dec 22, 2022
Connect-4-AI - AI that plays Connect-4 using the minimax algorithm

Connect-4-AI Brief overview I coded up the Connect-4 (or four-in-a-row) game in

Favour Okeke 1 Feb 15, 2022
Doing the OAuth dance with style using Flask, requests, and oauthlib.

Flask-Dance Doing the OAuth dance with style using Flask, requests, and oauthlib. Currently, only OAuth consumers are supported, but this project coul

David Baumgold 915 Dec 28, 2022
JSON Web Token implementation in Python

PyJWT A Python implementation of RFC 7519. Original implementation was written by @progrium. Sponsor If you want to quickly add secure token-based aut

José Padilla 4.5k Jan 09, 2023
A module making it easier to manage Discord oAuth with Quart

quart_discord A module making it easier to manage Discord oAuth with Quart Install pip install git+https://github.com/xelA/ 5 Oct 27, 2022

JSON Web Token Authentication support for Django REST Framework

REST framework JWT Auth JSON Web Token Authentication support for Django REST Framework Overview This package provides JSON Web Token Authentication s

Styria Digital Development 178 Jan 02, 2023
Django Auth Protection This package logout users from the system by changing the password in Simple JWT REST API.

Django Auth Protection Django Auth Protection This package logout users from the system by changing the password in REST API. Why Django Auth Protecti

Iman Karimi 5 Oct 26, 2022
Extending the Django authentication system with a phone verification step.

Extending the Django authentication system with a phone verification step.

Miguel Grinberg 50 Dec 04, 2022
OAuth2 goodies for the Djangonauts!

Django OAuth Toolkit OAuth2 goodies for the Djangonauts! If you are facing one or more of the following: Your Django app exposes a web API you want to

Jazzband 2.7k Dec 31, 2022
A Login/Registration GUI Application with SQLite database for manipulating data.

Login-Register_Tk A Login/Registration GUI Application with SQLite database for manipulating data. What is this program? This program is a GUI applica

Arsalan 1 Feb 01, 2022
A flask extension for managing permissions and scopes

Flask-Pundit A simple flask extension to organize resource authorization and scoping. This extension is heavily inspired by the ruby Pundit library. I

Anurag Chaudhury 49 Dec 23, 2022
Django x Elasticsearch Templates

Django x Elasticsearch Requirements Python 3.7 Django = 3 Elasticsearch 7.15 Setup Elasticsearch Install via brew Install brew tap elastic/tap brew

Aji Pratama 0 May 22, 2022