Transformers based fully on MLPs

Overview

Awesome MLP-based Transformers papersAwesome

An up-to-date list of Transformers based fully on MLPs without attention!

Why this repo?

After transformers and fully-based attention mechanism models took over most of the deep learning world since 2019, it appears that the power does not come from attention, and indeed replacing the feed-forward network in a transformer by attention performs horrible (~30% top-1 on ImageNet). It appears that Attention is not all we need. After all, we don't need inductive-biased models such as CNNs anymore, and we can lean back on MLPs since (1) we have enough data, (2) We have powerful optimization, regularization and data augmentation techniques. As we saw a big hipe on transformers awesome vision transformer and BERT-related papers, we expect to see a big hipe in fully MLP-based networks without attention, and the research focus is now shited to finding efficient ways of mixing tokens without involving attention mechanisms. This repository aims at gathering and collecting all these kind of papers.

Contributing

Please help in contributing to this list by submitting an issue or a pull request

- Paper Name [[pdf]](link) [[code]](link)

Papers

  • MLP-Mixer: An all-MLP Architecture for Vision [pdf] [official code] [code] [code] [code] [Yannic Kilcher Video]
  • Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet [pdf] [code]
  • ResMLP: Feedforward networks for image classification with data-efficient training [pdf] [code] [code] [code]
  • Pay Attention to MLPs [pdf] [code] [code] [code]
  • FNet: Mixing Tokens with Fourier Transforms [pdf] [code] [Yannic Kilcher Video]
  • Can Attention Enable MLPs To Catch Up With CNNs? [pdf]
  • MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [pdf]
  • On the Bias Against Inductive Biases [pdf]
  • S2 MLP: Spatial-Shift MLP Architecture for Vision [pdf]
  • Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [pdf] [code]
  • Rethinking Token-Mixing MLP for MLP-based Vision Backbone [pdf]
  • Global Filter Networks for Image Classification [pdf] [code]
  • What Makes for Hierarchical Vision Transformer? [pdf]
  • As-MLP: An Axial Shifted MLP architecture for Vision [pdf][code]
  • CycleMLP: A MLP-like Architecture for Dense Prediction [pdf][code]
  • S2 MLPv2: Improved Spatial-Shift MLP Architecture for Vision [pdf]
  • RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [pdf] [code]
  • Hire-MLP: Vision MLP via Hierarchical Rearrangement [pdf]
  • Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [pdf]
  • Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [pdf]
  • Patches Are All You Need? [pdf] [code]
  • Exploring the Limits of Large Scale Pre-training [pdf]
  • Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs [pdf] [code]
  • Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation [pdf] [code]
  • Are We Ready for a New Paradigm Shift? A Survey on Visual Deep MLP [pdf]
  • MetaFormer is Actually What You Need for Vision [pdf] [code]
  • An Image Patch is a Wave: Phase-Aware Vision MLP [pdf]
  • MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video [pdf]
  • SWAT: Spatial Structure Within and Among Tokens [pdf]
  • MLP Architectures for Vision-and-Language Modeling: An Empirical Study [pdf] [code]
  • RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [pdf] [code]
Owner
Fawaz Sammani
The human brain is a miracle every human has, and mathematically modelling that brain is an overwhelming matter! I like teaching machines vision-language
Fawaz Sammani
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network This is the official implementation of

azad 2 Jul 09, 2022
ParaGen is a PyTorch deep learning framework for parallel sequence generation

ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extrac

Bytedance Inc. 169 Dec 22, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. Download $ git clone http

26 Dec 13, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022
Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

BCMI 75 Oct 21, 2021
Covid19-Forecasting - An interactive website that tracks, models and predicts COVID-19 Cases

Covid-Tracker This is an interactive website that tracks, models and predicts CO

Adam Lahmadi 1 Feb 01, 2022
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys

Nguyễn Thị Thanh Kim 5 Nov 13, 2022
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022
pytorch implementation of trDesign

trdesign-pytorch This repository is a PyTorch implementation of the trDesign paper based on the official TensorFlow implementation. The initial port o

Learn Ventures Inc. 41 Dec 29, 2022
MIMO-UNet - Official Pytorch Implementation

MIMO-UNet - Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Rethinking Coarse-to-

Sungjin Cho 248 Jan 02, 2023
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Based on Stockfish neural network(similar to LcZero)

MarcoEngine Marco Engine - interesnaya neyronnaya shakhmatnaya set', kotoraya ispol'zuyet metod samoobucheniya(dostizheniye khoroshoy igy putem proboy

Marcus Kemaul 4 Mar 12, 2022
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
All course materials for the Zero to Mastery Machine Learning and Data Science course.

Zero to Mastery Machine Learning Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Maste

Daniel Bourke 1.6k Jan 08, 2023
Object Detection using YOLO from PyImageSearch

Object Detection using YOLO from PyImageSearch By applying object detection, you’ll not only be able to determine what is in an image, but also where

Mohamed NIANG 1 Feb 09, 2022
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
Multi-Output Gaussian Process Toolkit

Multi-Output Gaussian Process Toolkit Paper - API Documentation - Tutorials & Examples The Multi-Output Gaussian Process Toolkit is a Python toolkit f

GAMES 113 Nov 25, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
a simple, efficient, and intuitive text editor

Oxygen beta a simple, efficient, and intuitive text editor Overview oxygen is a simple, efficient, and intuitive text editor designed as more featured

Aarush Gupta 1 Feb 23, 2022