Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

Overview

🌀 WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 Paper WARP: Word-level Adversarial ReProgramming.

WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set Model CB RTE
F1 Acc. Acc.
dev
GPT-3 Small 26.1 42.9 52.3
GPT-3 Med 40.4 58.9 48.4
GPT-3 57.2 82.1 72.9
PET (ALBERT) 59.4 85.1 69.8
iPET (ALBERT) 92.4 92.9 74.0
WARPinit (ALBERT) 84.0 87.5 71.8
test
GPT-3 52.0 75.6 69.0
PET (ALBERT) 60.2 87.2 67.2
iPET (ALBERT) 79.9 88.8 70.8
WARPinit (ALBERT) 70.2 82.4 69.1
Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server. We only show systems performing in a similar few-shot training setup using 32 examples.

Setup

The code requires YerevaNN's internal version of allennlp

git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .

Training

Linear Probing

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [],
        "reorder_optimized": false,
        "max_batch_size": 8,
        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'
    python -m allennlp train \
    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done

WARP_0

"], "reorder_optimized": true, "max_batch_size": 8, "max_tokens_sq": 262144, "on_logits": "pre_decoder_layer_norm", "pooling_index": 1, "seed": 1 }' python -m allennlp train \ -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet done ">
for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [null, "
   
    "],
   
        "reorder_optimized": true,
        "max_batch_size": 8,
        "max_tokens_sq": 262144,
        "on_logits": "pre_decoder_layer_norm",
        "pooling_index": 1,
        "seed": 1
    }'
    python -m allennlp train \
    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done

Training WARP

", "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19," ",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29], "seed":1, "transformer_model":"roberta-large" }' python -m allennlp train \ -s .aim/t-${DATASET} configs/warp.jsonnet ">
export DATASET="rte"
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init":null,
    "dataset":"'$DATASET'",
    "ensure_whitespace_between":false,
    "lr":0.001,
    "max_batch_size":8,
    "max_tokens_sq":262144,
    "num_epochs":30,
    "prompt_better_init":"
    
     ",
    
    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"
    
     ",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
    
    "seed":1,
    "transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet

WARP_init

Few-Shot Experiments

", [-20, ","], null, [-29, "!"],-30,-31], "seed":3, "str_cut_frac":0, "transformer_model":"albert-xxlarge-v2", "validation_metric": null }' python -m allennlp train \ -s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet ">
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init": {
        "entailment": " yes",
        "not_entailment": " instead"
    },
    "dataset":"few_rte",
    "eval_mode":false,
    "lr":0.001,
    "max_batch_size":2,
    "max_tokens_sq":131072,
    "num_epochs":100,
    "num_gradient_accumulation_steps":2,
    "prompt_better_init": "[PAD]",
    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "
   
    ", [-20, ","], null, [-29, "!"],-30,-31],
   
    "seed":3,
    "str_cut_frac":0,
    "transformer_model":"albert-xxlarge-v2",
    "validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
",[-20,","],null,[-29,"!"],-30,-31], "seed":1, "str_cut_frac":0.06, "transformer_model":"albert-xxlarge-v2", "validation_metric":"+training_val_metric" }' python -m allennlp train \ -s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet ">
export HPARAMS='{
   "benchmark":"super_glue",
   "classifier_init":{
      "entailment":" yes",
      "not_entailment":" instead"
   },
   "dataset":"few_rte",
   "grad_norm":1,
   "lr":0.001,
   "max_batch_size":2,
   "max_tokens_sq":131072,
   "num_epochs":30,
   "num_gradient_accumulation_steps":2,
   "prompt_better_init":"[PAD]",
   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"
   
    ",[-20,","],null,[-29,"!"],-30,-31],
   
   "seed":1,
   "str_cut_frac":0.06,
   "transformer_model":"albert-xxlarge-v2",
   "validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test
python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.381",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}
Attack on Confidence Estimation algorithm from the paper "Disrupting Deep Uncertainty Estimation Without Harming Accuracy"

Attack on Confidence Estimation (ACE) This repository is the official implementation of "Disrupting Deep Uncertainty Estimation Without Harming Accura

3 Mar 30, 2022
Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy"

Shapeland Simulator Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy" Download the video at https://www.youtube.com/watch?

TouringPlans.com 70 Dec 14, 2022
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

Aryclenio Xavier Barros 26 Oct 14, 2022
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
Current state of supervised and unsupervised depth completion methods

Awesome Depth Completion Table of Contents About Sparse-to-Dense Depth Completion Current State of Depth Completion Unsupervised VOID Benchmark Superv

224 Dec 28, 2022
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

VidLanKD Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohi

Zineng Tang 54 Dec 20, 2022
Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

An Automated Tool to Hack Victim's Camera, Microphone, Location, Clipboard. Has 2 Extra Features. Version 1.1 Update Fixed Some Major Bugs Data Saving

ToxicNoob 36 Jan 07, 2023
Transferable Unrestricted Attacks, which won 1st place in CVPR’21 Security AI Challenger: Unrestricted Adversarial Attacks on ImageNet.

Transferable Unrestricted Adversarial Examples This is the PyTorch implementation of the Arxiv paper: Towards Transferable Unrestricted Adversarial Ex

equation 16 Dec 29, 2022
ByteTrack: Multi-Object Tracking by Associating Every Detection Box

ByteTrack ByteTrack is a simple, fast and strong multi-object tracker. ByteTrack: Multi-Object Tracking by Associating Every Detection Box Yifu Zhang,

Yifu Zhang 2.9k Jan 04, 2023
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

AttentionHTR PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks. Scene Text

Dmitrijs Kass 31 Dec 22, 2022
Python binding for Khiva library.

Khiva-Python Build Documentation Build Linux and Mac OS Build Windows Code Coverage README This is the Khiva Python binding, it allows the usage of Kh

Shapelets 46 Oct 16, 2022
Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

1 Jun 02, 2022
PyTorch implementation of Pay Attention to MLPs

gMLP PyTorch implementation of Pay Attention to MLPs. Quickstart Clone this repository. git clone https://github.com/jaketae/g-mlp.git Navigate to th

Jake Tae 34 Dec 13, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 799 Dec 28, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Repository for Driving Style Recognition algorithms for Autonomous Vehicles

Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making Created by Iago Pachêco Gomes at USP - ICM

Iago Gomes 9 Nov 28, 2022
AI Based Smart Exam Proctoring Package

AI Based Smart Exam Proctoring Package It takes image (base64) as input: Provide Output as: Detection of Mobile phone. Detection of More than 1 person

NARENDER KESWANI 3 Sep 09, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023