Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Replicating Minecraft World Generation in Python

Minecraft World Generation in Python This is an attempt to replicate Minecraft world generation in Python. This is part of an article published on Med

Bilal Himite 159 Dec 19, 2022
Simple implementation of the classic Snake Game in under 100 lines of code

Snake_Game Simple python implementation of the classic Snake Game in under 100 lines of code. Printscreen of the game: Imported Libraries: random; pyg

Raffaele Fiorillo 2 Jun 13, 2022
A short non 100% Accurate Solar System in pygame

solar-system-pygame Controls UP/DOWN for Emulation Speed Control ESC for Pause/Unpause q to Quit c or ESC again to Continue LEFT CLICK to Add an orbit

LightCrimson 2 May 28, 2022
Graphical impimetaion of Conway's Game of Life in Python using pyglet

Conway's Game of Life in Python Konstantin Opora Conway's Game of Life: graphical implementation in python using pyglet. developed in Python 3.10.0 Re

Konstantin Opora 1 Nov 30, 2021
A tool to design a planet for Galaxy Life Reborn game.

GLRBaseDesigner A program to design your planet for Galaxy Life Reborn game. Description Do you want to share your base design with friends? Now it's

jjay31 9 Dec 16, 2022
A "finish the lyrics" game using Spotify, YouTube Transcript, and YouTube Search APIs, coupled with visual machine learning

Singify Introducing Singify, the party game! Challenge your friend to who knows songs better. Play random songs from your very own Spotify playlist an

Josh Wong 4 Nov 19, 2021
Just to play with my kids: create a secret alphabet and exchange encrypted messages

Secret Alphabet Description This project allows you to randomly generate an alphabet (a set of characters) and its corresponding translation. For the

BS 1 Nov 12, 2021
A python script that uses pygame to display fractals.

Pygame-Fractals A python script that uses pygame to display interactive fractals. There are 3 fractals on the script. They can be displayed on the col

michel 2 Feb 09, 2022
Utility for generating randomizer datapacks for minecraft.

Minecraft Rando Utility for generating randomizer datapacks for minecraft. At the moment, it randomizes the following: Loot tables (including block dr

2 Dec 02, 2021
Chess GUI

Lucas Chess Lucas Chess is a GUI of chess: To learn to play chess. To play chess against engines. Dependencies Python 2.7 PyQt4 PyAudio psutil Python

Lucas 322 Dec 20, 2022
Visualizing and learning from games on chess.com

Better Your Chess What for? This project aims to help you learn from all the chess games you've played online, starting with a simple way to download

Luc d'Hauthuille 0 Apr 17, 2022
My goofy little script for playing wordle

my wordle "solver" My goofy little script for playing wordle. It actually runs really slowly at first but once you've added some info (e.g. which lett

MB 3 Feb 04, 2022
Box - a world simulator written in python with pygame

Box is a world simulator written in python with pygame. Features A world generation system A world editor Simulates creatures called boxlanders. You c

1up Community 3 Nov 14, 2022
Python Program: Hilo Game

Python Program: Hilo Game 🂡 Description Hilo is a game in which the player gues

2 Jan 22, 2022
Solo CLF project about the creation of the FlickColor game in Python with very precise instructions.

Solo CLF project about the creation of the FlickColor game in Python with very precise instructions.

COZAX 1 Dec 09, 2022
The Turtle Race Game built in Python with Turtle module.

Turtle Race Game The Turtle Race Game built in Python with Turtle module. Installation If you don't have Turtle module on your computer. You can downl

Aytaç Kaşoğlu 1 Nov 09, 2021
This is a two player snake game

Trake This is a two player snake game How to play the game There is food and two players. You try to eat food to become large and gain points. Player

Grrub 1 Dec 19, 2021
A game based on Motus, to be played on Unix terminals.

Motus python game A game based on Motus, to be played on Unix terminals. How to play? Before playing, you need to install all the requirements needed

Arthur Molia 1 Feb 02, 2022
🐍 Conway's Game of Life cellular automaton implemented in PyGame

Conway's Game of Life My PyGame implementation of Conway's Game of Life. This implementation involves treating all edges of the grid as stitched toget

Mateusz Żebrak 1 May 29, 2022
The Classic Fruit Collecting game made in python with pygame

FruitCollect A classic fruit Collecting game made with pygame Install pygame before running: "pip install pygame" Rules: Random fruits will drop from

Pranav Bobby 1 Dec 01, 2021