A curated list of efficient attention modules

Overview

awesome-fast-attention Awesome

A curated list of efficient attention modules (last update: Wed, 10 Mar 2021 23:52:22 +0000)

Table of Contents

Efficient Attention

Paper (citations) Implementation Computational Complexity AutoRegressive Main Idea
Generating Wikipedia by Summarizing Long Sequences (282) memory-compressed-attention formula ✔️
EXPAND

compresses key and value + blocked attention

CBAM: Convolutional Block Attention Module (999+) attention-module formula
EXPAND

combines the SE attention with a per pixel(local) weight

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (16) set_transformer formula
EXPAND

uses K relay nodes

CCNet: Criss-Cross Attention for Semantic Segmentation (296) CCNet formula
EXPAND

each pixel attends to its row and column simultaneously

Efficient Attention: Attention with Linear Complexities (16) efficient-attention formula
EXPAND

Softmax(Q)*(Softmax(K^T)*V)

Star-Transformer (40) fastNLP formula
EXPAND

uses a relay(global) node and attends to/from that node

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (199) GCNet formula
EXPAND

squeeze and excitation with an attention pooling (instead of a GAP)

Generating Long Sequences with Sparse Transformers (257) DeepSpeed formula ✔️
EXPAND

sparse block based attention

SCRAM: Spatially Coherent Randomized Attention Maps (1) - formula ✔️
EXPAND

uses PatchMatch to find close keys

Interlaced Sparse Self-Attention for Semantic Segmentation (24) IN_PAPER formula ✔️
EXPAND

combination of a short length and then long range(dilated) attention

Permutohedral Attention Module for Efficient Non-Local Neural Networks (3) Permutohedral_attention_module formula
EXPAND

uses permutohedral lattice approximation algorithm to approximate the attention output

Large Memory Layers with Product Keys (43) XLM formula ✔️
EXPAND

search for nearest neighbor keys

Expectation-Maximization Attention Networks for Semantic Segmentation (79) EMANet formula
EXPAND

applys expectation maximization to cluster keys into k clusters

BP-Transformer: Modelling Long-Range Context via Binary Partitioning (15) BPT formula ✔️
EXPAND

attends to distant tokens coarsely and attends to close tokens in a more fine-grained manner

Compressive Transformers for Long-Range Sequence Modelling (48) compressive-transformer-pytorch formula ✔️
EXPAND

compresses distant tokens instead of just stop_grad() ing them, more efficient version of transformerXL

Axial Attention in Multidimensional Transformers (36) axial-attention formula ✔️
EXPAND

apply attention on each axis separately

Reformer: The Efficient Transformer (216) trax formula ✔️
EXPAND

uses LSH to find close keys

Sparse Sinkhorn Attention (16) sinkhorn-transformer formula ✔️
EXPAND

uses a cost matrix to limit attention between buckets

Transformer on a Diet (2) transformer-on-diet formula ✔️
EXPAND

dilated transformer like wavenet

Time-aware Large Kernel Convolutions (9) TaLKConvolutions formula ✔️
EXPAND

calculate mean over a dynamic subsequence around each token with the help of summed-area table

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection (2) - formula ✔️
EXPAND

learns the q, k connections == dynamically creates a sparse attention matrix

Efficient Content-Based Sparse Attention with Routing Transformers (38) routing-transformer formula ✔️
EXPAND

computes attention with same-cluster tokens (computed by online k-means)

Neural Architecture Search for Lightweight Non-Local Networks (11) AutoNL formula
EXPAND

computes Q(KV) and also down samples q, k, v both in spatial and channel dimensions

Longformer: The Long-Document Transformer (159) longformer formula ✔️
EXPAND

global + blocked attention

ETC: Encoding Long and Structured Inputs in Transformers (16) - formula
EXPAND

combines global attention (star transformer with multiple global tokens) with local attention

Multi-scale Transformer Language Models (2) IN_PAPER formula ✔️
EXPAND

UNet like + retina attetion is something close to BP-Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models (26) Synthesizer-Rethinking-Self-Attention-Transformer-Models formula ✔️
EXPAND

does not compute pairwise interactions

Jukebox: A Generative Model for Music (45) jukebox formula ✔️
EXPAND

better attention patterns from Sparse Transformer

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers (0) - formula ✔️
EXPAND

does not compute pairwise interactions and uses fixed mask patters

GMAT: Global Memory Augmentation for Transformers (2) gmat formula
EXPAND

adds global tokens

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (45) fast-transformers formula ✔️
EXPAND

uses phi(q)(phi(k)v) and also improves the sequential sampling step

Linformer: Self-Attention with Linear Complexity (47) linformer-pytorch formula
EXPAND

project key and value from nd to kd

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers (8) google-research formula ✔️
EXPAND

calculate an unbiased stochastic approximation of the attention matrix

Kronecker Attention Networks (1) kronecker-attention-pytorch formula
EXPAND

uses horizontal and lateral average matrices

Real-time Semantic Segmentation with Fast Attention (5) - formula
EXPAND

l2_norm(q)*(l2_norm(k)*v)

Fast Transformers with Clustered Attention (6) fast-transformers formula
EXPAND

groups queries together with LSH

Big Bird: Transformers for Longer Sequences (60) DeepSpeed formula
EXPAND

ETC with random connections

Tensor Low-Rank Reconstruction for Semantic Segmentation (3) - formula
EXPAND

decompose the full attention tensor into rank one tensors (CP decomposition)

Looking for change? Roll the Dice and demand Attention (0) IN_PAPER formula
EXPAND

uses the fractal tanimoto similarity to compare queries with keys inside the attention module

Rethinking Attention with Performers (30) google-research formula ✔️
EXPAND

unbiased approximation of the attention matrix with softmax kernel

Memformer: The Memory-Augmented Transformer (0) memformer formula ✔️
EXPAND

attend to memory slots + Memory-Replay BackPropagation

SMYRF: Efficient Attention using Asymmetric Clustering (1) smyrf formula
EXPAND

LSH with balanced clusters

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (0) Informer2020 formula ✔️
EXPAND

sparse attention + funnel like encoder

Sub-Linear Memory: How to Make Performers SLiM (0) google-research formula ✔️
EXPAND

Performer but with sublinear Memory usage

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (0) Nystromformer formula
EXPAND

uses Nystrom method to approximate the attention matrix

Linear Transformers Are Secretly Fast Weight Memory Systems (0) fast-weight-transformers formula ✔️
EXPAND

show that linear transformers are basically fast weight networks + propose a new kernel function to linearise attention, balancing simplicity and effectiveness

LambdaNetworks: Modeling Long-Range Interactions Without Attention (6) lambda-networks formula ✔️
EXPAND

generates a linear layer based on context + decouple pos/context

Random Feature Attention (2) - formula ✔️
EXPAND

kernel approximation and also transformers are rnn

Articles/Surveys/Benchmarks

Owner
Sepehr Sameni
PhD Candidate at the University of Bern, Computer Vision Group
Sepehr Sameni
T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets (product titles, images, comments, etc.).

55 Nov 22, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 01, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
Finetune gpt-2 in google colab

gpt-2-colab finetune gpt-2 in google colab sample result (117M) from retraining on A Tale of Two Cities by Charles Di

212 Jan 02, 2023
Residual2Vec: Debiasing graph embedding using random graphs

Residual2Vec: Debiasing graph embedding using random graphs This repository contains the code for S. Kojaku, J. Yoon, I. Constantino, and Y.-Y. Ahn, R

SADAMORI KOJAKU 5 Oct 12, 2022
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

Hamza Rashid 2 Sep 27, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

Vibhor Agarwal 2 Jun 30, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Jungil Kong 1.1k Jan 02, 2023
Just a basic Telegram AI chat bot written in Python using Pyrogram.

Nikko ChatBot Just a basic Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher. A bot token. Installation $ https

ʀᴇxɪɴᴀᴢᴏʀ 2 Oct 21, 2022
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

TOPSIS implementation in Python Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) CHING-LAI Hwang and Yoon introduced TOPSIS

Hamed Baziyad 8 Dec 10, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023
Searching keywords in PDF file folders

keyword_searching Steps to use this Python scripts: (1)Paste this script into the file folder containing the PDF files you need to search from; (2)Thi

1 Nov 08, 2021
Automated question generation and question answering from Turkish texts using text-to-text transformers

Turkish Question Generation Offical source code for "Automated question generation & question answering from Turkish texts using text-to-text transfor

Open Business Software Solutions 29 Dec 14, 2022