Fastformer

Notes from the authors

Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The pytorch version is written in a huggingface transformers style. The jupyter notebooks contain the quickstart codes for text classification on AG's News (without pretrained word embeddings for simplicity), which can be directly run. We noticed that in our experiments, NOT all tasks need FFNN, residual connection, layer normalization and even position embedding. For example, we find that in news recommendation, it is better to directly use Fastformer without layer normalization and position embedding. However, in Ad CVR prediction, both position embedding and layer normalization are needed.

Keras version: 2.2.4 (may not be compatible with higher versions)

TF version: from 1.12 to 1.15 (may be compatible with lower versions)

Pytorch version: 1.6.0 (may be compatible with higher/lower versions)

Citation

@article{wu2021fastformer,
  title={Fastformer: Additive Attention Can Be All You Need},
  author={Wu, Chuhan and Wu, Fangzhao and Qi, Tao and Huang, Yongfeng},
  journal={arXiv preprint arXiv:2108.09084},
  year={2021}
}

A pytorch &keras implementation and demo of Fastformer.

Related tags

Overview

Fastformer

Notes from the authors

Citation

Owner

This repo will contain code to reproduce and build upon understanding transfer learning

ExCon: Explanation-driven Supervised Contrastive Learning

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Multi-Stage Spatial-Temporal Convolutional Neural Network (MS-GCN)

AI Toolkit for Healthcare Imaging

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

code for Fast Point Cloud Registration with Optimal Transport

Simple reimplemetation experiments about FcaNet

Code Repository for The Kaggle Book, Published by Packt Publishing

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Semiconductor Machine learning project

DISTIL: Deep dIverSified inTeractIve Learning.

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Tensorflow-seq2seq-tutorials - Dynamic seq2seq in TensorFlow, step by step

TensorFlow-based neural network library