Attention for PyTorch with Linear Memory Footprint

Unofficially implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention (+ some sidekick speedup on the GPU when compared to reference implementation in JAX)

Usage:

git clone https://github.com/CHARM-Tx/linear_mem_attention_pytorch
cd linear_mem_attention_pytorch
python setup.py install

Usage:

High Level

from linear_mem_attention_torch.fast_attn import Attention

batch, length, features = 2, 2**8, 64
x, ctx = torch.randn(2, batch, length, features)
mask = torch.randn(batch, length) < 1.

attn = Attention(dim=features, heads = 8, dim_head = 64, bias=False)

# self-attn
v_self = attn(x, x, mask, query_chunk_size=1024, key_chunk_size=4096)

# cross-attn
v_cross = attn(x, ctx, mask, query_chunk_size=1024, key_chunk_size=4096)

Low level

from linear_mem_attention_torch import attention

batch, length, heads, features = 2, 2**8, 8, 64
mask = torch.randn(batch, length) < 1.
q, k, v = torch.randn(3, batch, length, heads, features)

v_ = attention(q, k, v, mask, query_chunk_size=1024, key_chunk_size=4096)

Benchmarks

See examples/example_benchamrk.ipynb for more information.

Citations:

@misc{rabe2021selfattention,
      title={Self-attention Does Not Need $O(n^2)$ Memory}, 
      author={Markus N. Rabe and Charles Staats},
      year={2021},
      eprint={2112.05682},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Attention for PyTorch with Linear Memory Footprint

Related tags

Overview

Attention for PyTorch with Linear Memory Footprint

Usage:

Usage:

High Level

Low level

Benchmarks

Citations:

Owner

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

A real world application of a Recurrent Neural Network on a binary classification of time series data

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

A command line simple note taking app

Official Python implementation of the FuzionCoin protocol

ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.

CPU inference engine that delivers unprecedented performance for sparse models

Unofficial implementation of Perceiver IO: A General Architecture for Structured Inputs & Outputs

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Character Controllers using Motion VAEs

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Pytorch Lightning Distributed Accelerators using Ray

rliable is an open-source Python library for reliable evaluation, even with a handful of runs, on reinforcement learning and machine learnings benchmarks.

Creating Artificial Life with Reinforcement Learning

An Open-Source Toolkit for Prompt-Learning.

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

Cereal box identification in store shelves using computer vision and a single train image per model.