You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Related tags

Deep LearningYOSO
Overview

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hash- ing (LSH), decreases the quadratic complexity of such models to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (to enable deployment on GPU architectures).

Requirements

docker, nvidia-docker

Start Docker Container

Under YOSO folder, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:4

For Nvidia's 30 series GPU, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:5

Then, the YOSO folder is mapped to /workspace in the container.

BERT

Datasets

To be updated

Pre-training

To start pre-training of a specific configuration: create a folder YOSO/BERT/models/ (for example, bert-small) and write YOSO/BERT/models/ /config.json to specify model and training configuration, then under YOSO/BERT folder, run

python3 run_pretrain.py --model 
   

   

The command will create a YOSO/BERT/models/ /model folder holding all checkpoints and log file.

Pre-training from Different Model's Checkpoint

Copy a checkpoint (one of .model or .cp file) from YOSO/BERT/models/ /model folder to YOSO/BERT/models/ folder and add a key-value pair in YOSO/BERT/models/ /config.json : "from_cp": " " . One example is shown in YOSO/BERT/models/bert-small-4096/config.json. This procedure also works for extending the max sequence length of a model (For example, use bert-small pre-trained weights as initialization for bert-small-4096).

GLUE Fine-tuning

Under YOSO/BERT folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint cp-0249.model

The command will create a log file in YOSO/BERT/models/ /model .

Long Range Arena Benchmark

Datasets

To be updated

Run Evaluations

To start evaluation of a specific model on a task in LRA benchmark:

  • Create a folder YOSO/LRA/models/ (for example, softmax)
  • Write YOSO/LRA/models/ /config.json to specify model and training configuration

Under YOSO/LRA folder, run

python3 run_task.py --model 
   
     --task 
    

    
   

For example, run

python3 run_task.py --model softmax --task listops

The command will create a YOSO/LRA/models/ /model folder holding the best validation checkpoint and log file. After completion, the test set accuracy can be found in the last line of the log file.

RoBERTa

Datasets

To be updated

Pre-training

To start pretraining of a specific configuration:

  • Create a folder YOSO/RoBERTa/models/ (for example, bert-small)
  • Write YOSO/RoBERTa/models/ /config.json to specify model and training configuration

Under YOSO/RoBERTa folder, run

python3 run_pretrain.py --model 
   

   

For example, run

python3 run_pretrain.py --model bert-small

The command will create a YOSO/RoBERTa/models/ /model folder holding all checkpoints and log file.

GLUE Fine-tuning

To fine-tune model on GLUE tasks:

Under YOSO/RoBERTa folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint 249

The command will create a log file in YOSO/RoBERTa/models/ /model .

Citation

@article{zeng2021yoso,
  title={You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling},
  author={Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh},
  booktitle={Proceedings of the International Conference on Machine Learning},
  year={2021}
}
Owner
Zhanpeng Zeng
Zhanpeng Zeng
OpenMMLab Model Deployment Toolset

Introduction English | 简体中文 MMDeploy is an open-source deep learning model deployment toolset. It is a part of the OpenMMLab project. Major features F

OpenMMLab 1.5k Dec 30, 2022
Comp445 project - Data Communications & Computer Networks

COMP-445 Data Communications & Computer Networks Change Python version in Conda

Peng Zhao 2 Oct 03, 2022
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax ⚠️ Latest: Current repo is a complete version. But we delet

FishYuLi 341 Dec 23, 2022
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Companion code for "Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees"

Companion code for "Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees" Installa

0 Oct 13, 2021
A Python package to create, run, and post-process MODFLOW-based models.

Version 3.3.5 — release candidate Introduction FloPy includes support for MODFLOW 6, MODFLOW-2005, MODFLOW-NWT, MODFLOW-USG, and MODFLOW-2000. Other s

388 Nov 29, 2022
NAVER BoostCamp Final Project

CV 14조 final project Super Resolution and Deblur module Inference code & Pretrained weight Repo SwinIR Deblur 실행 방법 streamlit run WebServer/Server_SRD

JiSeong Kim 5 Sep 06, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 09, 2022
covid question answering datasets and fine tuned models

Covid-QA Fine tuned models for question answering on Covid-19 data. Hosted Inference This model has been contributed to huggingface.Click here to see

Abhijith Neil Abraham 19 Sep 09, 2021
An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

LingXY 4 Jun 20, 2022
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022
3rd place solution for the Weather4cast 2021 Stage 1 Challenge

weather4cast2021_Stage1 3rd place solution for the Weather4cast 2021 Stage 1 Challenge Dependencies The code can be executed from a fresh environment

5 Aug 14, 2022
dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

dualFace dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ) We provide python implementations for our CVM 2021 paper "dualFac

Haoran XIE 46 Nov 10, 2022
Monitora la qualità della ricezione dei segnali radio nelle province siciliane.

FMap-server Monitora la qualità della ricezione dei segnali radio nelle province siciliane. Conversion data Frequency - StationName maps are stored in

Triglie 5 May 24, 2021
Implementation of various Vision Transformers I found interesting

Implementation of various Vision Transformers I found interesting

Kim Seonghyeon 78 Dec 06, 2022
This repository contains the scripts for downloading and validating scripts for the documents

HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,

JHU Human Language Technology Center of Excellence 6 Jun 07, 2022
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
Code for "Localization with Sampling-Argmax", NeurIPS 2021

Localization with Sampling-Argmax [Paper] [arXiv] [Project Page] Localization with Sampling-Argmax Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-

JeffLi 71 Dec 17, 2022
PushForKiCad - AISLER Push for KiCad EDA

AISLER Push for KiCad Push your layout to AISLER with just one click for instant

AISLER 31 Dec 29, 2022
Hepsiburada - Hepsiburada Urun Bilgisi Cekme

Hepsiburada Urun Bilgisi Cekme from hepsiburada import Marka nike = Marka("nike"

Ilker Manap 8 Oct 26, 2022