[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Last update: Mar 26, 2022

Overview

Qu-ANTI-zation

This repository contains the code for reproducing the results of our paper:

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes [NeurIPS 2021]
Sanghyun Hong, Michael-Andrei Panaitescu-Liess, Yigitcan Kaya, Tudor Dumitras.

TL; DR

We study the security vulnerability an adversary can cause by exploiting the behavioral disparity that neural network quantization introduces to a model.

Abstract (Tell me more!)

Quantization is a popular technique that transforms the parameter representation of a neural network from floating-point numbers into lower-precision ones (e.g., 8-bit integers). It reduces the memory footprint and the computational cost at inference, facilitating the deployment of resource-hungry models. However, the parameter perturbations caused by this transformation result in behavioral disparities between the model before and after quantization. For example, a quantized model can misclassify some test-time samples that are otherwise classified correctly. It is not known whether such differences lead to a new security vulnerability. We hypothesize that an adversary may control this disparity to introduce specific behaviors that activate upon quantization. To study this hypothesis, we weaponize quantization-aware training and propose a new training framework to implement adversarial quantization outcomes. Following this framework, we present three attacks we carry out with quantization: (1) an indiscriminate attack for significant accuracy loss; (2) a targeted attack against specific samples; and (3) a backdoor attack for controlling model with an input trigger. We further show that a single compromised model defeats multiple quantization schemes, including robust quantization techniques. Moreover, in a federated learning scenario, we demonstrate that a set of malicious participants who conspire can inject our quantization-activated backdoor. Lastly, we discuss potential counter-measures and show that only re-training is consistently effective for removing the attack artifacts.

Prerequisites

Download Tiny-ImageNet dataset.

    $ mkdir datasets
    $ ./download.sh

Download the pre-trained models from Google Drive.

    $ unzip models.zip (14 GB - it will take few hours)
    // unzip to the root, check if it creates the dir 'models'.

Injecting Malicious Behaviors into Pre-trained Models

Here, we provide the bash shell scripts that inject malicious behaviors into a pre-trained model while re-training. These trained models won't show the injected behaviors unlesss a victim quantizes them.

Indiscriminate attacks: run attack_w_lossfn.sh
Targeted attacks: run class_w_lossfn.sh (a specific class) | sample_w_lossfn.sh (a specific sample)
Backdoor attacks: run backdoor_w_lossfn.sh

Run Some Analysis

Examine the model's properties (e.g., Hessian)

Use the run_analysis.py to examine various properties of the malicious models. Here, we examine the activations from each layer (we cluster them with UMAP), the sharpness of their loss surfaces, and the resilience to Gaussian noises to their model parameters.

Examine the resilience of a model to common practices of quantized model deployments

Use the run_retrain.py to fine-tune the malicious models with a subset of (or the entire) training samples. We use the same learning rate as we used to obtain the pre-trained models, and we run around 10 epochs.

Federated Learning Experiments

To run the federated learning experiments, use the attack_fedlearn.py script.

To run the script w/o any compromised participants.

    $ python attack_fedlearn.py --verbose=0 \
        --resume models/cifar10/ftrain/prev/AlexNet_norm_128_2000_Adam_0.0001.pth \
        --malicious_users=0 --multibit --attmode accdrop --epochs_attack 10

To run the script with 5% of compromised participants.

    // In case of the indiscriminate attacks
    $ python attack_fedlearn.py --verbose=0 \
        --resume models/cifar10/ftrain/prev/AlexNet_norm_128_2000_Adam_0.0001.pth \
        --malicious_users=5 --multibit --attmode accdrop --epochs_attack 10

    // In case of the backdoor attacks
    $ python attack_fedlearn.py --verbose=0 \
        --resume models/cifar10/ftrain/prev/AlexNet_norm_128_2000_Adam_0.0001.pth \
        --malicious_users=5 --multibit --attmode backdoor --epochs_attack 10

Cite Our Work

Please cite our work if you find this source code helpful.

[Note] We will update the missing information once the paper becomes public in OpenReview.

@inproceedings{Hong2021QuANTIzation,
    author = {Hong, Sanghyun and Panaitescu-Liess, Michael-Andrei and Kaya, Yiǧitcan and Dumitraş, Tudor},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {},
    pages = {},
    publisher = {},
    title = {{Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes}},
    url = {},
    volume = {34},
    year = {2021}
}

Please contact Sanghyun Hong for any questions and recommendations.

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Related tags

Overview

Qu-ANTI-zation

TL; DR

Abstract (Tell me more!)

Prerequisites

Injecting Malicious Behaviors into Pre-trained Models

Run Some Analysis

Examine the model's properties (e.g., Hessian)

Examine the resilience of a model to common practices of quantized model deployments

Federated Learning Experiments

Cite Our Work

Owner

Secure AI Systems Lab

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

PlaidML is a framework for making deep learning work everywhere.

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

Pytorch Lightning Distributed Accelerators using Ray

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

Dynamica causal Bayesian optimisation

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DANA paper supplementary materials

Neural Architecture Search Powered by Swarm Intelligence 🐜

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Meli Data Challenge 2021 - First Place Solution

Pytorch modules for paralel models with same architecture. Ideal for multi agent-based systems

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Official PyTorch implementation of the NeurIPS 2021 paper StyleGAN3