MLP-Mixer: An all-MLP Architecture for Vision

This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision.

Usage :

import torch
import numpy as np
from mlp-mixer import MLPMixer

img = torch.ones([1, 3, 224, 224])

model = MLPMixer(in_channels=3, image_size=224, patch_size=16, num_classes=1000,
                 dim=512, depth=8, token_dim=256, channel_dim=2048)

parameters = filter(lambda p: p.requires_grad, model.parameters())
parameters = sum([np.prod(p.size()) for p in parameters]) / 1_000_000
print('Trainable Parameters: %.3fM' % parameters)

out_img = model(img)

print("Shape of out :", out_img.shape)  # [B, in_channels, image_size, image_size]

Citation :

@misc{tolstikhin2021mlpmixer,
      title={MLP-Mixer: An all-MLP Architecture for Vision}, 
      author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
      year={2021},
      eprint={2105.01601},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement :

Some component borrowed from ViT code of @lucidrains repo : https://github.com/lucidrains/vit-pytorch

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

Related tags

Overview

MLP-Mixer: An all-MLP Architecture for Vision

Usage :

Citation :

Acknowledgement :

Owner

Rishikesh (ऋषिकेश)

基于PaddleOCR搭建的OCR server... 离线部署用

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models

Model Serving Made Easy

Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

This implements the learning and inference/proposal algorithm described in "Learning to Propose Objects, Krähenbühl and Koltun"

An alarm clock coded in Python 3 with Tkinter

Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Ray tracing of a Schwarzschild black hole written entirely in TensorFlow.

Signals-backend - A suite of card games written in Python

This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Curved Projection Reformation

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Dynamic hair modeling from monocular videos using deep neural networks

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.