PyTorch implementation of Pay Attention to MLPs

Last update: Dec 13, 2022

Overview

gMLP

PyTorch implementation of Pay Attention to MLPs.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/g-mlp.git

Navigate to the cloned directory. You can use the barebone gMLP model via

>>> from g_mlp import gMLP
>>> model = gMLP()

By default, the model comes with the following parameters:

gMLP(
    d_model=256,
    d_ffn=512,
    seq_len=256,
    num_layers=6,
)

Usage

The repository also contains gMLP models specifically for language modeling and image classification.

NLP

gMLPForLanguageModeling shares the same default parameters as gMLP, with num_tokens=10000 as an added parameter that represents the size of the token embedding table.

>>> from g_mlp import gMLPForLanguageModeling
>>> model = gMLPForLanguageModeling()
>>> tokens = torch.randint(0, 10000, (8, 256))
>>> model(tokens).shape
torch.Size([8, 256, 256])

Computer Vision

gMLPForImageClassification is a ViT-esque version of gMLP that includes a patch creating layer and a final classification head.

>>> from g_mlp import gMLPForImageClassification
>>> model = gMLPForImageClassification()
>>> images = torch.randn(8, 3, 256, 256)
>>> model(images).shape
torch.Size([8, 1000])

Summary

The authors of the paper present gMLP, an an attention-free all-MLP architecture based on spatial gating units. gMLP achieves parity with transformer models such as ViT and BERT on language and vision downstream tasks. The authors also show that gMLP scales with increased data and number of parameters, suggesting that self-attention is not a necessary component for designing performant models.

PyTorch implementation of Pay Attention to MLPs

Related tags

Overview

gMLP

Quickstart

Usage

NLP

Computer Vision

Summary

Resources

Owner

Jake Tae

Local Attention - Flax module for Jax

Do Neural Networks for Segmentation Understand Insideness?

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021.

Interpretation of T cell states using reference single-cell atlases

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

A Python parser that takes the content of a text file and then reads it into variables.

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Azion the best solution of Edge Computing in the world.

A pre-trained model with multi-exit transformer architecture.

Datasets for new state-of-the-art challenge in disentanglement learning

Generating Images with Recurrent Adversarial Networks

shufflev2-yolov5：lighter, faster and easier to deploy

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Deep Learning as a Cloud API Service.

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Permeability Prediction Via Multi Scale 3D CNN

Make your first PR. A beginner friendly repository made specifically for open source beginners. Add any program under any language (it can be anything from a simple program to a complex data structure algorithm). Happy coding...

Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)