Transformers based fully on MLPs

Overview

Awesome MLP-based Transformers papersAwesome

An up-to-date list of Transformers based fully on MLPs without attention!

Why this repo?

After transformers and fully-based attention mechanism models took over most of the deep learning world since 2019, it appears that the power does not come from attention, and indeed replacing the feed-forward network in a transformer by attention performs horrible (~30% top-1 on ImageNet). It appears that Attention is not all we need. After all, we don't need inductive-biased models such as CNNs anymore, and we can lean back on MLPs since (1) we have enough data, (2) We have powerful optimization, regularization and data augmentation techniques. As we saw a big hipe on transformers awesome vision transformer and BERT-related papers, we expect to see a big hipe in fully MLP-based networks without attention, and the research focus is now shited to finding efficient ways of mixing tokens without involving attention mechanisms. This repository aims at gathering and collecting all these kind of papers.

Contributing

Please help in contributing to this list by submitting an issue or a pull request

- Paper Name [[pdf]](link) [[code]](link)

Papers

  • MLP-Mixer: An all-MLP Architecture for Vision [pdf] [official code] [code] [code] [code] [Yannic Kilcher Video]
  • Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet [pdf] [code]
  • ResMLP: Feedforward networks for image classification with data-efficient training [pdf] [code] [code] [code]
  • Pay Attention to MLPs [pdf] [code] [code] [code]
  • FNet: Mixing Tokens with Fourier Transforms [pdf] [code] [Yannic Kilcher Video]
  • Can Attention Enable MLPs To Catch Up With CNNs? [pdf]
  • MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [pdf]
  • On the Bias Against Inductive Biases [pdf]
  • S2 MLP: Spatial-Shift MLP Architecture for Vision [pdf]
  • Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [pdf] [code]
  • Rethinking Token-Mixing MLP for MLP-based Vision Backbone [pdf]
  • Global Filter Networks for Image Classification [pdf] [code]
  • What Makes for Hierarchical Vision Transformer? [pdf]
  • As-MLP: An Axial Shifted MLP architecture for Vision [pdf][code]
  • CycleMLP: A MLP-like Architecture for Dense Prediction [pdf][code]
  • S2 MLPv2: Improved Spatial-Shift MLP Architecture for Vision [pdf]
  • RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [pdf] [code]
  • Hire-MLP: Vision MLP via Hierarchical Rearrangement [pdf]
  • Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [pdf]
  • Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [pdf]
  • Patches Are All You Need? [pdf] [code]
  • Exploring the Limits of Large Scale Pre-training [pdf]
  • Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs [pdf] [code]
  • Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation [pdf] [code]
  • Are We Ready for a New Paradigm Shift? A Survey on Visual Deep MLP [pdf]
  • MetaFormer is Actually What You Need for Vision [pdf] [code]
  • An Image Patch is a Wave: Phase-Aware Vision MLP [pdf]
  • MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video [pdf]
  • SWAT: Spatial Structure Within and Among Tokens [pdf]
  • MLP Architectures for Vision-and-Language Modeling: An Empirical Study [pdf] [code]
  • RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [pdf] [code]
Owner
Fawaz Sammani
The human brain is a miracle every human has, and mathematically modelling that brain is an overwhelming matter! I like teaching machines vision-language
Fawaz Sammani
Build Graph Nets in Tensorflow

Graph Nets library Graph Nets is DeepMind's library for building graph networks in Tensorflow and Sonnet. Contact DeepMind 5.2k Jan 05, 2023

keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 09, 2022
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Target Adaptive Context Aggregation for Video Scene Graph Generation This is a PyTorch implementation for Target Adaptive Context Aggregation for Vide

Multimedia Computing Group, Nanjing University 44 Dec 14, 2022
A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)

BraTS2020 A Light & Scalable Solution to BraTS2020 | Medical Brain Tumor Segmentation (2D Segmentation) Developed the segmentation models for segregat

Gunjan Haldar 0 Jan 19, 2022
Code and datasets for TPAMI 2021

SkeletonNet This repository constains the codes and ShapeNetV1-Surface-Skeleton,ShapNetV1-SkeletalVolume and 2d image datasets ShapeNetRendering. Plea

34 Aug 15, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Lihe Yang 209 Jan 01, 2023
Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE)

OG-SPACE Introduction Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE) is a computational framewo

Data and Computational Biology Group UNIMIB (was BI*oinformatics MI*lan B*icocca) 0 Nov 17, 2021
Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

CVSM Group - email: <a href=[email protected]"> 84 Nov 22, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Clay Mullis 82 Oct 13, 2022
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper — Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

ERFNet This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation. NEW!! New PyTorch

Edu 104 Jan 05, 2023
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Node-level Graph Regression with Deep Gaussian Process Models

Node-level Graph Regression with Deep Gaussian Process Models Prerequests our implementation is mainly based on tensorflow 1.x and gpflow 1.x: python

1 Jan 16, 2022
A command line simple note taking app

Why yet another note taking program? note was designed with a very specific target in mind: me, and my 2354 scraps of paper. It runs from the command

64 Nov 20, 2022
Oscar and VinVL

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates

Microsoft 938 Dec 26, 2022