Malware Bypass Research using Reinforcement Learning

Last update: Dec 26, 2022

Overview

MalwareRL

Malware Bypass Research using Reinforcement Learning

Background

This is a malware manipulation environment using OpenAI's gym environments. The core idea is based on paper "Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning" (paper). I am extending the original repo because:

It is no longer maintained
It uses Python2 and an outdated version of LIEF
I wanted to integrate new Malware gym environments and additional manipulations

Over the past three years there have been breakthrough open-source projects published in the security ML space. In particular, Ember (Endgame Malware BEnchmark for Research) (paper) and MalConv: Malware detection by eating a whole exe (paper) have provided security researchers the ability to develop sophisticated, reproducible models that emulate features/techniques found in NGAVs.

MalwareRL Gym Environment

MalwareRL exposes gym environments for both Ember and MalConv to allow researchers to develop Reinforcement Learning agents to bypass Malware Classifiers. Actions include a variety of non-breaking (e.g. binaries will still execute) modifications to the PE header, sections, imports and overlay and are listed below.

Action Space

ACTION_TABLE = {
    'modify_machine_type': 'modify_machine_type',
    'pad_overlay': 'pad_overlay',
    'append_benign_data_overlay': 'append_benign_data_overlay',
    'append_benign_binary_overlay': 'append_benign_binary_overlay',
    'add_bytes_to_section_cave': 'add_bytes_to_section_cave',
    'add_section_strings': 'add_section_strings',
    'add_section_benign_data': 'add_section_benign_data',
    'add_strings_to_overlay': 'add_strings_to_overlay',
    'add_imports': 'add_imports',
    'rename_section': 'rename_section',
    'remove_debug': 'remove_debug',
    'modify_optional_header': 'modify_optional_header',
    'modify_timestamp': 'modify_timestamp',
    'break_optional_header_checksum': 'break_optional_header_checksum',
    'upx_unpack': 'upx_unpack',
    'upx_pack': 'upx_pack'
}

Observation Space

The observation_space of the gym environments are an array representing the feature vector. For ember this is numpy.array == 2381 and malconv numpy.array == 1024**2. The MalConv gym presents an opportunity to try RL techniques to generalize learning across large State Spaces.

Agents

A baseline agent RandomAgent is provided to demonstrate how to interact w/ gym environments and expected output. This agent attempts to evade the classifier by randomly selecting an action. This process is repeated up to the length of a game (e.g. 50 mods). If the modifed binary scores below the classifier threshold we register it as an evasion. In a lot of ways the RandomAgent acts as a fuzzer trying a bunch of actions with no regard to minimizing the modifications of the resulting binary.

Additional agents will be developed and made available (both model and code) in the coming weeks.

Table 1: Evasion Rate against Ember Holdout Dataset*

gym	agent	evasion_rate	avg_ep_len
ember	RandomAgent	89.2%	8.2
malconv	RandomAgent	88.5%	16.33

* 250 random samples

Setup

To get malware_rl up and running you will need the follow external dependencies:

LIEF
Ember, Malconv and SOREL-20M models. All of these then need to be placed into the malware_rl/envs/utils/ directory.

The SOREL-20M model requires use of the aws-cli in order to get. When accessing the AWS S3 bucket, look in the sorel-20m-model/checkpoints/lightGBM folder and fish out any of the models in the seed folders. The model file will need to be renamed to sorel.model and placed into malware_rl/envs/utils alongside the other models.
UPX has been added to support pack/unpack modifications. Download the binary here and place in the malware_rl/envs/controls directory.
Benign binaries - a small set of "trusted" binaries (e.g. grabbed from base Windows installation) you can download some via MSFT website (example). Store these binaries in malware_rl/envs/controls/trusted
Run strings command on those binaries and save the output as .txt files in malware_rl/envs/controls/good_strings
Download a set of malware from VirusShare or VirusTotal. I just used a list of hashes from the Ember dataset

Note: The helper script download_deps.py can be used as a quickstart to get most of the key dependencies setup.

I used a conda env set for Python3.7:

conda create -n malware_rl python=3.7

Finally install the Python3 dependencies in the requirements.txt.

pip3 install -r requirements.txt

References

The are a bunch of good papers/blog posts on manipulating binaries to evade ML classifiers. I compiled a few that inspired portions of this project below. Also, I have inevitably left out other pertinent reseach, so if there is something that should be in here let me know in an Git Issue or hit me up on Twitter (@filar).

Papers

Demetrio, Luca, et al. "Efficient Black-box Optimization of Adversarial Windows Malware with Constrained Manipulations." arXiv preprint arXiv:2003.13526 (2020). (paper)
Demetrio, Luca, et al. "Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection." arXiv preprint arXiv:2008.07125 (2020). (paper)
Song, Wei, et al. "Automatic Generation of Adversarial Examples for Interpreting Malware Classifiers." arXiv preprint arXiv:2003.03100 (2020). (paper)
Suciu, Octavian, Scott E. Coull, and Jeffrey Johns. "Exploring adversarial examples in malware detection." 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 2019. (paper)
Fleshman, William, et al. "Static malware detection & subterfuge: Quantifying the robustness of machine learning and current anti-virus." 2018 13th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 2018. (paper)
Pierazzi, Fabio, et al. "Intriguing properties of adversarial ML attacks in the problem space." 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020. (paper/code)
Fang, Zhiyang, et al. "Evading anti-malware engines with deep reinforcement learning." IEEE Access 7 (2019): 48867-48879. (paper)

Blog Posts

Talks

42: The answer to life the universe and everything offensive security by Will Pearce, Nick Landers (slides)
Bot vs. Bot: Evading Machine Learning Malware Detection by Hyrum Anderson (slides)
Trying to Make Meterpreter into an Adversarial Example by Andy Applebaum (slides)

Malware Bypass Research using Reinforcement Learning

Related tags

Overview

MalwareRL

Background

MalwareRL Gym Environment

Action Space

Observation Space

Agents

Setup

References

Papers

Blog Posts

Talks

Owner

Bobby Filar

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Repository for tackling Kaggle Ultrasound Nerve Segmentation challenge using Torchnet.

Code repository for the paper "Tracking People with 3D Representations"

A simple Tensorflow based library for deep and/or denoising AutoEncoder.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

Exploring Visual Engagement Signals for Representation Learning

Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

History Aware Multimodal Transformer for Vision-and-Language Navigation

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

This repository contains code, network definitions and pre-trained models for working on remote sensing images using deep learning

JudeasRx - graphical app for doing personalized causal medicine using the methods invented by Judea Pearl et al.

Repository to run object detection on a model trained on an autonomous driving dataset.

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

DeepLab-ResNet rebuilt in TensorFlow

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs