Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Overview

README

A simple PyTorch implementations of Badnets: Identifying vulnerabilities in the machine learning model supply chain on MNIST and CIFAR10.

Install

$ git clone https://github.com/verazuo/badnets-pytorch.git
$ cd badnets-pytorch
$ pip install -r requirements.txt

Usage

Download Dataset

Run below command to download MNIST and cifar10 into ./dataset/.

$ python data_downloader.py

Run Backdoor Attack

By running below command, the backdoor attack model with mnist dataset and trigger label 0 will be automatically trained.

$ python main.py
# read dataset: mnist

# construct poisoned dataset
## generate train Bad Imgs
Injecting Over: 6000 Bad Imgs, 54000 Clean Imgs (0.10)
## generate test Bad Imgs
Injecting Over: 0 Bad Imgs, 10000 Clean Imgs (0.00)
## generate test Bad Imgs
Injecting Over: 10000 Bad Imgs, 0 Clean Imgs (1.00)

# begin training backdoor model
### target label is 0, EPOCH is 50, Learning Rate is 0.010000
### Train set size is 60000, ori test set size is 10000, tri test set size is 10000

100%|█████████████████████████████████████████████████████████████████████████████████████| 938/938 [00:36<00:00, 25.82it/s]
# EPOCH0   loss: 43.5323  training acc: 0.7790, ori testing acc: 0.8455, trigger testing acc: 0.1866

... ...

100%|█████████████████████████████████████████████████████████████████████████████████████| 938/938 [00:38<00:00, 24.66it/s]
# EPOCH49   loss: 0.6333  training acc: 0.9959, ori testing acc: 0.9854, trigger testing acc: 0.9975

# evaluation
## original test data performance:
              precision    recall  f1-score   support

    0 - zero       0.91      0.99      0.95       980
     1 - one       0.98      0.99      0.98      1135
     2 - two       0.97      0.96      0.96      1032
   3 - three       0.98      0.97      0.97      1010
    4 - four       0.98      0.98      0.98       982
    5 - five       0.99      0.96      0.98       892
     6 - six       0.99      0.97      0.98       958
   7 - seven       0.98      0.97      0.97      1028
   8 - eight       0.96      0.98      0.97       974
    9 - nine       0.98      0.95      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

## triggered test data performance:
              precision    recall  f1-score   support

    0 - zero       1.00      0.91      0.95     10000
     1 - one       0.00      0.00      0.00         0
     2 - two       0.00      0.00      0.00         0
   3 - three       0.00      0.00      0.00         0
    4 - four       0.00      0.00      0.00         0
    5 - five       0.00      0.00      0.00         0
     6 - six       0.00      0.00      0.00         0
   7 - seven       0.00      0.00      0.00         0
   8 - eight       0.00      0.00      0.00         0
    9 - nine       0.00      0.00      0.00         0

    accuracy                           0.91     10000
   macro avg       0.10      0.09      0.10     10000
weighted avg       1.00      0.91      0.95     10000

Run below command to see cifar10 result.

$ python main.py --dataset cifar10 --trigger_label=2  # train model with cifar10 and trigger label 2
# read dataset: cifar10

# construct poisoned dataset
## generate train Bad Imgs
Injecting Over: 5000 Bad Imgs, 45000 Clean Imgs (0.10)
## generate test Bad Imgs
Injecting Over: 0 Bad Imgs, 10000 Clean Imgs (0.00)
## generate test Bad Imgs
Injecting Over: 10000 Bad Imgs, 0 Clean Imgs (1.00)

# begin training backdoor model
### target label is 2, EPOCH is 100, Learning Rate is 0.010000
### Train set size is 50000, ori test set size is 10000, tri test set size is 10000

100%|█████████████████████████████████████████████████████████████████████████████████████| 782/782 [00:30<00:00, 25.45it/s]
# EPOCH0   loss: 69.2022  training acc: 0.2357, ori testing acc: 0.2031, trigger testing acc: 0.5206
... ...
100%|█████████████████████████████████████████████████████████████████████████████████████| 782/782 [00:32<00:00, 23.94it/s]
# EPOCH99   loss: 33.8019  training acc: 0.6914, ori testing acc: 0.4936, trigger testing acc: 0.9790

# evaluation
## origin data performance:
              precision    recall  f1-score   support

    airplane       0.60      0.56      0.58      1000
  automobile       0.57      0.62      0.59      1000
        bird       0.36      0.45      0.40      1000
         cat       0.36      0.29      0.32      1000
        deer       0.49      0.32      0.39      1000
         dog       0.34      0.54      0.41      1000
        frog       0.57      0.50      0.53      1000
       horse       0.61      0.48      0.54      1000
        ship       0.60      0.67      0.63      1000
       truck       0.55      0.51      0.53      1000

    accuracy                           0.49     10000
   macro avg       0.51      0.49      0.49     10000
weighted avg       0.51      0.49      0.49     10000

## triggered data performance:
              precision    recall  f1-score   support

    airplane       0.00      0.00      0.00         0
  automobile       0.00      0.00      0.00         0
        bird       1.00      0.98      0.99     10000
         cat       0.00      0.00      0.00         0
        deer       0.00      0.00      0.00         0
         dog       0.00      0.00      0.00         0
        frog       0.00      0.00      0.00         0
       horse       0.00      0.00      0.00         0
        ship       0.00      0.00      0.00         0
       truck       0.00      0.00      0.00         0

    accuracy                           0.98     10000
   macro avg       0.10      0.10      0.10     10000
weighted avg       1.00      0.98      0.99     10000

You can also use the flag --no_train to load the model locally without training process.

$ python main.py --dataset cifar10 --no_train  # load model file locally.

More parameters are allowed to set, run python main.py -h to see detail.

$ python main.py -h
usage: main.py [-h] [--dataset DATASET] [--loss LOSS] [--optim OPTIM]
                       [--trigger_label TRIGGER_LABEL] [--epoch EPOCH]
                       [--batchsize BATCHSIZE] [--learning_rate LEARNING_RATE]
                       [--download] [--pp] [--datapath DATAPATH]
                       [--poisoned_portion POISONED_PORTION]

Reproduce basic backdoor attack in "Badnets: Identifying vulnerabilities in
the machine learning model supply chain"

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     Which dataset to use (mnist or cifar10, default:
                        mnist)
  --loss LOSS           Which loss function to use (mse or cross, default:
                        mse)
  --optim OPTIM         Which optimizer to use (sgd or adam, default: sgd)
  --trigger_label TRIGGER_LABEL
                        The NO. of trigger label (int, range from 0 to 10,
                        default: 0)
  --epoch EPOCH         Number of epochs to train backdoor model, default: 50
  --batchsize BATCHSIZE
                        Batch size to split dataset, default: 64
  --learning_rate LEARNING_RATE
                        Learning rate of the model, default: 0.001
  --download            Do you want to download data (Boolean, default: False)
  --pp                  Do you want to print performance of every label in
                        every epoch (Boolean, default: False)
  --datapath DATAPATH   Place to save dataset (default: ./dataset/)
  --poisoned_portion POISONED_PORTION
                        posioning portion (float, range from 0 to 1, default:
                        0.1)

Structure

.
├── checkpoints/   # save models.
├── data/          # store definitions and funtions to handle data.
├── dataset/       # save datasets.
├── logs/          # save run logs.
├── models/        # store definitions and functions of models
├── utils/         # general tools.
├── LICENSE
├── README.md
├── main.py   # main file of badnets.
├── deeplearning.py   # model training funtions
└── requirements.txt

Contributing

PRs accepted.

License

MIT © Vera

Owner
Vera
Security Researcher/Sci-fi Author
Vera
This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

Maitri Shah 1 Jan 06, 2022
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

isvd Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning If you find this code useful, you may cite us as: @inprocee

Sami Abu-El-Haija 16 Jan 08, 2023
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
neural image generation

pixray Pixray is an image generation system. It combines previous ideas including: Perception Engines which uses image augmentation and iteratively op

dribnet 398 Dec 17, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

Facebook Research 504 Jan 02, 2023
BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

BBB Streamer NG? Makes a conference like this... ...streamable like this! I also recorded a small video showing the basic features: https://www.youtub

Lukas Schauer 60 Oct 21, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 09, 2022
[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery This is the official implementation of our ICCV 2021 paper News There maybe some bugs in

73 Nov 30, 2022
Virtual hand gesture mouse using a webcam

NonMouse 日本語のREADMEはこちら This is an application that allows you to use your hand itself as a mouse. The program uses a web camera to recognize your han

Yuki Takeyama 55 Jan 01, 2023
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Shoufa Chen 244 Dec 27, 2022
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022
Transfer SemanticKITTI labeles into other dataset/sensor formats.

LiDAR-Transfer Transfer SemanticKITTI labeles into other dataset/sensor formats. Content Convert datasets (NUSCENES, FORD, NCLT) to KITTI format Minim

Photogrammetry & Robotics Bonn 64 Nov 21, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
Meta-learning for NLP

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks Code for training the meta-learning models and fine-tuning on downstr

IESL 43 Nov 08, 2022
A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

George Gunter 4 Nov 14, 2022
TimeSHAP explains Recurrent Neural Network predictions.

TimeSHAP TimeSHAP is a model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes even

Feedzai 90 Dec 18, 2022
Explaining neural decisions contrastively to alternative decisions.

Contrastive Explanations for Model Interpretability This is the repository for the paper "Contrastive Explanations for Model Interpretability", about

AI2 16 Oct 16, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022