QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

Overview

Qsynthesis

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/O based synthesis and a whitebox AST search to synthesize sub-expressions (if the root node cannot be synthesized).

This algorithm as originaly been described at the BAR academic workshop:

The code has been release as part of the following Black Hat talk:

Disclaimer: This framework is experimental, and shall only be used for experimentation purposes. It mainly aims at stimulating research in this area.

Documentation

The installation, examples, and API documentation is available on the dedicated documentation: Documentation

Functionalities

The core synthesis is based on Triton symbolic engine on which is built the whole framework. It provides the following functionalities:

  • synthesis of bitvector expressions
  • ability to check through SMT the semantic equivalence of synthesized expressions
  • ability to synthesize constants (if the expression encode a constant)
  • ability to improve oracles (pre-computed tables) overtime through a learning mechanism
  • ability to reassemble synthesized expression back to assembly
  • ability to serve oracles through a REST API to facilitate the synthesis usage
  • an IDA plugin providing an integration of the synthesis

Quick start

Installation

In order to work Triton first has to be installed: install documentation. Triton does not automatically install itself in a virtualenv, copy it in your venv or use --system-site-packages when configuring your venv.

Then:

$ git clone [email protected]:synthesis/qsynthesis.git
$ cd qsynthesis
$ pip3 install '.[all]'

The [all] will installed all dependencies (see the documentation for a light install).

Table generation

The synthesis algorithm requires generating oracle tables derived from a grammar (a set of variables and operators). Qsynthesis installation provides the utility qsynthesis-table-manager enabling manipulating tables. The following command generate a table with 3 variables of 64 bits, 5 operators using a vector of 16 inputs. We limit the generation to 5 million entries.

$ qsynthesis-table-manager generate -bs 64 --var-num 3 --input-num 16 --random-level 5 --ops AND,NEG,MUL,XOR,NOT --watchdog 80 --limit 5000000 my_oracle_table
Generate Table
Watchdog value: 80.0
Depth 2 (size:3) (Time:0m0.23120s)
Depth 3 (size:21) (Time:0m0.23198s)
Depth 4 (size:574) (Time:0m0.26068s)
Depth 5 (size:400858) (Time:0m21.23231s)
Threshold reached, generation interrupted
Stop required
Depth 5 (size:5000002) (Time:4m52.56009s) [RAM:9.52Gb]

Note: The generation process is RAM consuming the --watchdog enables setting a percentage of the RAM above which the generation is interrupted.

Synthesizing a bitvector expression

We then can try simplifying a seemingly obfuscated expression with:

from qsynthesis import SimpleSymExec, TopDownSynthesizer, InputOutputOracleLevelDB

blob = b'UH\x89\xe5H\x89}\xf8H\x89u\xf0H\x89U\xe8H\x89M\xe0L\x89E\xd8H\x8bE' \
       b'\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0H\x8dH\x01H\x8b' \
       b'E\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2H)\xd0H\x83\xe8' \
       b'\x02H!\xc1H\x8bE\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0' \
       b'H\x8dp\x01H\x8bE\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2' \
       b'H)\xd0H\x83\xe8\x02H\t\xf0H)\xc1H\x89\xc8H\x83\xe8\x01]\xc3'

# Perform symbolic execution of the instructions
symexec = SimpleSymExec("x86_64")
symexec.initialize_register('rip', 0x40B160)  # arbitrary address
symexec.initialize_register('rsp', 0x800000)  # arbitrary stack
symexec.execute_blob(blob, 0x40B160)
rax = symexec.get_register_ast("rax")  # retrieve rax register expressions

# Load lookup tables
ltm = InputOutputOracleLevelDB.load("my_oracle_table")

# Perform Synthesis of the expression
synthesizer = TopDownSynthesizer(ltm)
synt_rax, simp = synthesizer.synthesize(rax)

print(f"expression: {rax.pp_str}")
print(f"synthesized expression: {synt_rax.pp_str} [{simp}]")

Limitations

  • synthesis accuracy limited by pre-computed tables exhaustivness
  • table generation limited by RAM consumption
  • reassembly cannot involve memory variable, destination is necessarily a register and architecture depends on llvmlite (thus mostly x86_64)
  • the code references trace-based synthesis which is disabled (as the underlying framework is not yet open-source)

Authors

  • Robin David (@RobinDavid), Quarkslab

Contributors

Huge thanks to contributors to this research:

  • Luigi Coniglio
  • Jonathan Salwan
You might also like...
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour. Currently compiles to URCL with plans to add additional formats in the future.

a BTC mining program based on python3

BTC-Miner a BTC mining program based on python3 Our project refers to the nightminer project by ricmoo, which is written in Python2 (https://github.co

texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Custom Python linting through AST expressions
Custom Python linting through AST expressions

bellybutton bellybutton is a customizable, easy-to-configure linting engine for Python. What is this good for? Tools like pylint and flake8 provide, o

Graphical Python debugger which lets you easily view the values of all evaluated expressions
Graphical Python debugger which lets you easily view the values of all evaluated expressions

birdseye birdseye is a Python debugger which records the values of expressions in a function call and lets you easily view them after the function exi

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Enable ++x and --x expressions in Python

By default, Python supports neither pre-increments (like ++x) nor post-increments (like x++). However, the first ones are syntactically correct since Python parses them as two subsequent +x operations, where + is the unary plus operator (same with --x and the unary minus). They both have no effect, since in practice -(-x) == +(+x) == x.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message. Everything is highly customizable.

MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

A library for pattern matching on symbolic expressions in Python.
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Comments
  • unrolling issue with PlaceHolderSynthesizer

    unrolling issue with PlaceHolderSynthesizer

    The PlaceHolderSynthesizer seem's not to properly replace placeholder variable during the synthesis process, on some use-cases. The resulting synthesized expression is thus completely wrong. I need to dig deeper.

    opened by RobinDavid 7
Owner
Quarkslab
Quarkslab
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

João Assalim 1 Oct 10, 2022
A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Fasta Uploader A tool for batch processing large fasta files and accompanying metadata table to repositories via API The python fasta_uploader.py scri

Centre for Infectious Disease and One Health 1 Dec 09, 2021
This project is a set of programs that I use to create a README.md file.

🤖 codex-readme 📜 codex-readme What is it? This project is a set of programs that I use to create a README.md file. How does it work? It reads progra

Tom Dörr 224 Jan 07, 2023
A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

README 1 gnucash2ledger gnucash2ledger is a Python script based on the Github Gist by nonducor (nonducor/gcash2ledger.py). This Python script will tak

Thomas Freeman 0 Jan 28, 2022
FUSE filesystem Python scripts for Nintendo console files

ninfs (formerly fuse-3ds) is a FUSE program to extract data from Nintendo game consoles. It works by presenting a virtual filesystem with the contents of your games, NAND, or SD card contents, and yo

Ian Burgwin 343 Jan 02, 2023
Extract an archive file (zip file or tar file) stored on AWS S3

S3 Extract Extract an archive file (zip file or tar file) stored on AWS S3. Details Downloads archive from S3 into memory, then extract and re-upload

Evan 1 Dec 14, 2021
Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files

Daniel Hrisca 440 Dec 31, 2022
Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

Jason R. Coombs 1k Dec 28, 2022
Vericopy - This Python script provides various usage modes for secure local file copying and hashing.

Vericopy This Python script provides various usage modes for secure local file copying and hashing. Hash data is captured and logged for paths before

15 Nov 05, 2022
File storage with API access. Used as a part of the Swipio project

API File storage File storage with API access. Used as a part of the Swipio project 📝 About The Project File storage allows you to upload and downloa

25 Sep 17, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 02, 2023
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
Utils for streaming large files (S3, HDFS, gzip, bz2...)

smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor

RARE Technologies 2.7k Jan 06, 2023
An universal file format tool kit. At present will handle the ico format problem.

An universal file format tool kit. At present will handle the ico format problem.

Sadam·Sadik 1 Dec 26, 2021
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 2.1k Jan 01, 2023
A tiny Configuration File Parser for Python Projects

A tiny Configuration File Parser for Python Projects. Currently working on JSON Config Files only.

Tanmoy Sen Gupta 1 Feb 12, 2022
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
A Certificate renaming tool made for IEEE CS SBC, SJCE.

PDF Batch Renamer Made for IEEE CS SBC, SJCE How to use? Before using the python script, ensure that pytesseract, pdf2image, opencv and other supporti

Ashwin Kumar U 2 Nov 14, 2021
Python module that parse power builder file (PBD) and analyze code

PowerBuilder-decompile Python module that parse power builder file (PBD) and analyze code (Incomplete) this tool is composed of: pbd_dump.py pbd file

Samy Sultan 8 Dec 15, 2022
A python wrapper for libmagic

python-magic python-magic is a Python interface to the libmagic file type identification library. libmagic identifies file types by checking their hea

Adam Hupp 2.3k Dec 29, 2022