An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Accounts for Django made beautifully simple

Django Userena Userena is a Django application that supplies your Django project with full account management. It's a fully customizable application t

Bread & Pepper 1.3k Sep 18, 2022
Simplifying third-party authentication for web applications.

Velruse is a set of authentication routines that provide a unified way to have a website user authenticate to a variety of different identity provider

Ben Bangert 253 Nov 14, 2022
Django CAS 1.0/2.0/3.0 client authentication library, support Django 2.0, 2.1, 2.2, 3.0 and Python 3.5+

django-cas-ng django-cas-ng is Django CAS (Central Authentication Service) 1.0/2.0/3.0 client library to support SSO (Single Sign On) and Single Logou

django-cas-ng 347 Dec 18, 2022
Simple two factor authemtication system, made by me.

Simple two factor authemtication system, made by me. Honestly, i don't even know How 2FAs work I just used my knowledge and did whatever i could. Send

Refined 5 Jan 04, 2022
Implementation of Supervised Contrastive Learning with AMP, EMA, SWA, and many other tricks

SupCon-Framework The repo is an implementation of Supervised Contrastive Learning. It's based on another implementation, but with several differencies

Ivan Panshin 132 Dec 14, 2022
OpenStack Keystone auth plugin for HTTPie

httpie-keystone-auth OpenStack Keystone auth plugin for HTTPie. Installation $ pip install --upgrade httpie-keystone-auth You should now see keystone

Pavlo Shchelokovskyy 1 Oct 20, 2021
Django-registration (redux) provides user registration functionality for Django websites.

Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor

Andrew Cutler 920 Jan 08, 2023
🔐 Login & Register System

🔐 Login & Register System This is a developable login and register system. Enter your username and password to register or login to account. Automati

Firdevs Akbayır 10 Dec 12, 2022
examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles

examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles ( Examiner , Supervisor , Student )

Ameer Nasser 4 Oct 28, 2021
Crie seus tokens de autenticação com o AScrypt.

AScrypt tokens O AScrypt é uma forma de gerar tokens de autenticação para sua aplicação de forma rápida e segura. Todos os tokens que foram, mesmo que

Jaedson Silva 0 Jun 24, 2022
Get inside your stronghold and make all your Django views default login_required

Stronghold Get inside your stronghold and make all your Django views default login_required Stronghold is a very small and easy to use django app that

Mike Grouchy 384 Nov 23, 2022
Quick and simple security for Flask applications

Note This project is non maintained anymore. Consider the Flask-Security-Too project as an alternative. Flask-Security It quickly adds security featur

Matt Wright 1.6k Dec 19, 2022
Connect-4-AI - AI that plays Connect-4 using the minimax algorithm

Connect-4-AI Brief overview I coded up the Connect-4 (or four-in-a-row) game in

Favour Okeke 1 Feb 15, 2022
A JSON Web Token authentication plugin for the Django REST Framework.

Simple JWT Abstract Simple JWT is a JSON Web Token authentication plugin for the Django REST Framework. For full documentation, visit django-rest-fram

Simple JWT 3.3k Jan 01, 2023
Python module for generating and verifying JSON Web Tokens

python-jwt Module for generating and verifying JSON Web Tokens. Note: From version 2.0.1 the namespace has changed from jwt to python_jwt, in order to

David Halls 210 Dec 24, 2022
Simple extension that provides Basic, Digest and Token HTTP authentication for Flask routes

Flask-HTTPAuth Simple extension that provides Basic and Digest HTTP authentication for Flask routes. Installation The easiest way to install this is t

Miguel Grinberg 1.1k Jan 05, 2023
Django Rest Framework App wih JWT Authentication and other DRF stuff

Django Queries App with JWT authentication, Class Based Views, Serializers, Swagger UI, CI/CD and other cool DRF stuff API Documentaion /swagger - Swa

Rafael Salimov 4 Jan 29, 2022
Python One-Time Password Library

PyOTP - The Python One-Time Password Library PyOTP is a Python library for generating and verifying one-time passwords. It can be used to implement tw

PyAuth 2.2k Dec 26, 2022
MikroTik Authentication POCs

Proofs of concept which successfully authenticate with MikroTik Winbox and MAC Telnet servers running on RouterOS version 6.45.1+

Margin Research 56 Dec 08, 2022
OAuthlib support for Python-Requests!

Requests-OAuthlib This project provides first-class OAuth library support for Requests. The OAuth 1 workflow OAuth 1 can seem overly complicated and i

1.6k Dec 28, 2022