An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Overview

Fast Transformer Twitter

PyPI Lint Code Base Upload Python Package Code style: black

DOI GitHub License GitHub stars GitHub followers Twitter Follow

This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

Installation

Run the following to install:

pip install fast-transformer

Developing fast-transformer

To install fast-transformer, along with tools you need to develop and test, run the following in your virtualenv:

git clone https://github.com/Rishit-dagli/Fast-Transformer.git
# or clone your own fork

cd fast-transformer
pip install -e .[dev]

Usage

import tensorflow as tf
from fast_transformer import FastTransformer

mask = tf.ones([1, 4096], dtype=tf.bool)
model = FastTransformer(
    num_tokens = 20000,
    dim = 512,
    depth = 2,
    max_seq_len = 4096,
    absolute_pos_emb = True, # Absolute positional embeddings
    mask = mask
)
x = tf.experimental.numpy.random.randint(0, 20000, (1, 4096))

logits = model(x) # (1, 4096, 20000)

Want to Contribute 🙋‍♂️ ?

Awesome! If you want to contribute to this project, you're always welcome! See Contributing Guidelines. You can also take a look at open issues for getting more information about current or upcoming tasks.

Want to discuss? 💬

Have any questions, doubts or want to present your opinions, views? You're always welcome. You can start discussions.

Citation

@misc{wu2021fastformer,
    title   = {Fastformer: Additive Attention is All You Need}, 
    author  = {Chuhan Wu and Fangzhao Wu and Tao Qi and Yongfeng Huang},
    year    = {2021},
    eprint  = {2108.09084},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

Yannic Kilcher's video was super helpful while building this.

License

Copyright 2020 Rishit Dagli

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
You might also like...
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

PixelPick This is an official implementation of the paper
PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Smaller Multilingual Transformers This repository shares smaller versions of multilingual transformers that keep the same representations offered by t

Code for
Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

Comments
  • Implement Additive Attention

    Implement Additive Attention

    Implement Additive Attention as a TensorFlow layer:

    • [x] Figure out using rotary embeddings
    • [x] Add masking functionality
    • [x] Relative Position embeddings
    • [x] Calculate query attention logits
    • [x] Calculate Global Query tokens
    • [x] Calculate key attention logits
    • [x] Calculate Global Key tokens
    • [x] Add queries as residuals
    opened by Rishit-dagli 0
Releases(v0.2.0)
  • v0.2.0(Jan 16, 2022)

    ✅ Bug Fixes / Improvements

    • Unit Tests for output rank and shape
    • Looser dependency requirements (now supports all TensorFlow versions >= 2.5.0)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Sep 3, 2021)

    This is the initial release of Fast Transformer and implements Fast Transformer as a subclassed TensorFlow model.

    Classes

    • FastAttention: Implements additive attention as a TensorFlow Keras layer, and supports using relative positional encodings.
    • PreNorm: Normalize the activations of the previous layer for each given example in a batch independently and apply some function to it, implemented as a TensorFlow Keras Layer.
    • FeedForward: Create a FeedForward neural net with two Dense layers and GELU activation, implemented as a TensorFlow Keras Layer.
    • FastTransformer: Implements the FastTransformer model using all the other classes, allows using rotary embeddings, weight tie projections, and converts to logits. Implemented as a TensorFlow Keras Model.
    Source code(tar.gz)
    Source code(zip)
Owner
Rishit Dagli
High School,TEDx,2xTED-Ed speaker | International Speaker | Microsoft Student Ambassador | Mentor, @TFUGMumbai | Organize @KotlinMumbai
Rishit Dagli
Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

Ananya Gupta 1 Jan 15, 2022
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (ICCV, 2021) (PyTorch) - We released the training code!

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution Kai Zhang, Jingyun Liang, Luc Van Gool, Radu Timofte Computer Vision Lab

Kai Zhang 804 Jan 08, 2023
Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis

Hierarchical Attention Mining (HAM) for weakly-supervised abnormality localization This is the official PyTorch implementation for the HAM method. Pap

Xi Ouyang 22 Jan 02, 2023
Object detection using yolo-tiny model and opencv used as backend

Object detection Algorithm used : Yolo algorithm Backend : opencv Library required: opencv = 4.5.4-dev' Quick Overview about structure 1) main.py Load

2 Jul 06, 2022
WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

HeadPoseEstimation-WHENet-yolov4-onnx-openvino ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L 1. Usage $ git clone htt

Katsuya Hyodo 49 Sep 21, 2022
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
Accelerated SMPL operation, commonly used in generate 3D human mesh, STAR included.

SMPL2 An enchanced and accelerated SMPL operation which commonly used in 3D human mesh generation. It takes a poses, shapes, cam_trans as inputs, outp

JinTian 20 Oct 17, 2022
My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

anime upscaler My usage of Real-ESRGAN to upscale anime, I hope to use this on a proper GPU cuz doing this on CPU is completely shit 😂 , I even tried

Shangar Muhunthan 29 Jan 07, 2023
Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Self-supervised learning Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive loss

Arijit Das 2 Mar 26, 2022
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

MindsDB Inc 12.2k Jan 03, 2023
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

Linda MacPhee-Cobb 386 Jan 03, 2023
这是一个deeplabv3-plus-pytorch的源码,可以用于训练自己的模型。

DeepLabv3+:Encoder-Decoder with Atrous Separable Convolution语义分割模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download 训练步骤

Bubbliiiing 350 Dec 28, 2022
Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

1 Feb 14, 2022
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

Michael Janner 269 Jan 05, 2023
Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

NLN: Nearest-Latent-Neighbours A repository containing the implementation of the paper entitled Improving Novelty Detection using the Reconstructions

Michael (Misha) Mesarcik 4 Dec 14, 2022
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data By Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, W

Taihong Xiao 141 Apr 16, 2021