Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Last update: Dec 08, 2022

Related tags

Overview

MLP-Mixer

Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy.

In this paper, the authors show a performance close to SotA in an image classification benchmark using MLP(Multi-layer perceptron) without using CNN and Transformer.

MLP-Mixer (Mixer for short) consists of per-patch linear embeddings, Mixer layers, and a classifier head. Mixer layers contain one token-mixing MLP and one channel-mixing MLP, each consisting of two fully-connected layers and a GELU nonlinearity. Other components include: skip-connections, dropout, and linear classifier head.

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

Available models: Mixer-B_16, Mixer-L_16
- imagenet pre-train models
  - Mixer-B_16, Mixer-L_16
- imagenet-21k pre-train models
  - Mixer-B_16, Mixer-L_16

# imagenet pre-train
wget https://storage.googleapis.com/mixer_models/imagenet1k/{MODEL_NAME}.npz

# imagenet-21k pre-train
wget https://storage.googleapis.com/mixer_models/imagenet21k/{MODEL_NAME}.npz

2. Fine-tuning

python3 train.py --name cifar10-100_500 --model_type Mixer-B_16 --pretrained_dir checkpoint/Mixer-B_16.npz

Reproducing Mixer results

upstream	model	dataset	acc(official)
ImageNet	Mixer-B/16	cifar10	96.72
ImageNet	Mixer-L/16	cifar10	96.59
ImageNet-21k	Mixer-B/16	cifar10	96.82
ImageNet-21k	Mixer-L/16	cifar10	96.34

Reference

Google's Vision Transformer and MLP-Mixer

Citations

@article{tolstikhin2021,
  title={MLP-Mixer: An all-MLP Architecture for Vision},
  author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
  journal={arXiv preprint arXiv:2105.01601},
  year={2021}
}

Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Related tags

Overview

MLP-Mixer

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

2. Fine-tuning

Reproducing Mixer results

Reference

Citations

Owner

Eunkwang Jeon

A simple tutoral for error correction task, based on Pytorch

Predicting Event Memorability from Contextual Visual Semantics

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

Sharpness-Aware Minimization for Efficiently Improving Generalization

Epidemiology analysis package

Py-faster-rcnn - Faster R-CNN (Python implementation)

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

PyTorch implementation of PSPNet

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

Inkscape extensions for figure resizing and editing

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Biomarker identification for COVID-19 Severity in BALF cells Single-cell RNA-seq data

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

Cross-modal Deep Face Normals with Deactivable Skip Connections