Rocket-recycling with Reinforcement Learning

Last update: Jan 03, 2023

Related tags

Overview

Rocket-recycling with Reinforcement Learning

I have long been fascinated by the recovery process of SpaceX rockets. In this mini-project, I worked on an interesting question that whether we can address this problem with simple reinforcement learning.

I tried on two tasks: hovering and landing. The rocket is simplified into a rigid body on a 2D plane with a thin rod, considering the basic cylinder dynamics model and air resistance proportional to the velocity.

Their reward functions are quite straightforward.

For the hovering tasks: the step-reward is given based on two factors:
1. the distance between the rocket and the predefined target point - the closer they are, the larger reward will be assigned.
2. the angle of the rocket body (the rocket should stay as upright as possible)
For the landing task: the step-reward is given based on three factors:
1. and 2) are the same as the hovering task
2. Speed and angle at the moment of contact with the ground - when the touching-speed are smaller than a safe threshold and the angle is close to 90 degrees (upright), we see it as a successful landing and a big reward will be assigned.

A thrust-vectoring engine is installed at the bottom of the rocket. This engine provides different thrust values (0, 0.5g, and 1.5g) with three different angles (-15, 0, and +15 degrees).

The action space is defined as a collection of the discrete control signals of the engine. The state-space consists of the rocket position (x, y), speed (vx, vy), angle (a), angle speed (va), and the simulation time steps (t).

I implement the above environment and train a policy-based agent (actor-critic) on solving this problem. The episode reward finally converges very well after over 40000 training episodes.

Despite the simple setting of the environment and the reward, the agent successfully learned the starship classic belly flop maneuver, which makes me quite surprising. The following animation shows a comparison between the real SN10 and a fake one learned from reinforcement learning.

Requirements

See Requirements.txt.

Usage

To train an agent, see ./example_train.py

To test an agent:

import torch
from rocket import Rocket
from policy import ActorCritic
import os
import glob

# Decide which device we want to run on
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

if __name__ == '__main__':

    task = 'hover'  # 'hover' or 'landing'
    max_steps = 800
    ckpt_dir = glob.glob(os.path.join(task+'_ckpt', '*.pt'))[-1]  # last ckpt

    env = Rocket(task=task, max_steps=max_steps)
    net = ActorCritic(input_dim=env.state_dims, output_dim=env.action_dims).to(device)
    if os.path.exists(ckpt_dir):
        checkpoint = torch.load(ckpt_dir)
        net.load_state_dict(checkpoint['model_G_state_dict'])

    state = env.reset()
    for step_id in range(max_steps):
        action, log_prob, value = net.get_action(state)
        state, reward, done, _ = env.step(action)
        env.render(window_name='test')
        if env.already_crash:
            break

License

Rocket-recycling by Zhengxia Zou is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Citation

@misc{zou2021rocket,
  author = {Zhengxia Zou},
  title = {Rocket-recycling with Reinforcement Learning},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jiupinjia/rocket-recycling}}
}

Rocket-recycling with Reinforcement Learning

Related tags

Overview

Rocket-recycling with Reinforcement Learning

Requirements

Usage

License

Citation

Owner

Zhengxia Zou

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

Reimplementation of Learning Mesh-based Simulation With Graph Networks

The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

Toward Spatially Unbiased Generative Models (ICCV 2021)

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Machine learning Bot detection technique, based on United States election dataset

A simple program for training and testing vit

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Official repository for Natural Image Matting via Guided Contextual Attention

clustimage is a python package for unsupervised clustering of images.

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Temporal Segment Networks (TSN) in PyTorch

A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP