MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Overview

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv)

This is a Pytorch implementation of our paper. We present Vision Permutator, a conceptually simple and data efficient MLP-like architecture for visual recognition. We show that our Vision Permutators are formidable competitors to convolutional neural networks (CNNs) and vision transformers.

We hope this work could encourage researchers to rethink the way of encoding spatial information and facilitate the development of MLP-like models.

Compare

Basic structure of the proposed Permute-MLP layer. The proposed Permute-MLP layer contains three branches that are responsible for encoding features along the height, width, and channel dimensions, respectively. The outputs from the three branches are then combined using element-wise addition, followed by a fully-connected layer for feature fusion.

Our code is based on the pytorch-image-models, Token Labeling, T2T-ViT

Comparison with Recent MLP-like Models

Model Parameters Throughput Image resolution Top 1 Acc. Download Logs
EAMLP-14 30M 711 img/s 224 78.9%
gMLP-S 20M - 224 79.6%
ResMLP-S24 30M 715 img/s 224 79.4%
ViP-Small/7 (ours) 25M 719 img/s 224 81.5% link
EAMLP-19 55M 464 img/s 224 79.4%
Mixer-B/16 59M - 224 78.5%
ViP-Medium/7 (ours) 55M 418 img/s 224 82.7% link
gMLP-B 73M - 224 81.6%
ResMLP-B24 116M 231 img/s 224 81.0%
ViP-Large/7 88M 298 img/s 224 83.2% link

The throughput is measured on a single machine with V100 GPU (32GB) with batch size set to 32.

Training ViP-Small/7 takes less than 30h on ImageNet for 300 epochs on a node with 8 A100 GPUs.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Validation

Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path

CUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint

Training

Command line for training on 8 GPUs (V100)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model vip_s7 -b 256 -j 8 --opt adamw --epochs 300 --sched cosine --apex-amp --img-size 224 --drop-path 0.1 --lr 2e-3 --weight-decay 0.05 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 20

Reference

You may want to cite:

@misc{hou2021vision,
    title={Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition},
    author={Qibin Hou and Zihang Jiang and Li Yuan and Ming-Ming Cheng and Shuicheng Yan and Jiashi Feng},
    year={2021},
    eprint={2106.12368},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

License

This repository is released under the MIT License as found in the LICENSE file. For commercial use, please contact with the authors.

Owner
Qibin (Andrew) Hou
Research fellow at NUS.
Qibin (Andrew) Hou
code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

PreSumm This code is for EMNLP 2019 paper Text Summarization with Pretrained Encoders Updates Jan 22 2020: Now you can Summarize Raw Text Input!. Swit

Yang Liu 1.2k Dec 28, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
Official implementation of "Articulation Aware Canonical Surface Mapping"

Articulation-Aware Canonical Surface Mapping Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani Paper Project Page Requirements Python

Nilesh Kulkarni 56 Dec 16, 2022
Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

Решения и подсказки к тренировке по алгоритмам от Яндекса Что есть внутри Решения с подсказками и комментариями; рекомендую сначала смотреть md файл п

Yankovsky Andrey 50 Dec 26, 2022
Anime Face Detector using mmdet and mmpose

Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model

198 Jan 07, 2023
Pytorch implementation code for [Neural Architecture Search for Spiking Neural Networks]

Neural Architecture Search for Spiking Neural Networks Pytorch implementation code for [Neural Architecture Search for Spiking Neural Networks] (https

Intelligent Computing Lab at Yale University 28 Nov 18, 2022
Experiments with differentiable stacks and queues in PyTorch

Please use stacknn-core instead! StackNN This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in s

Will Merrill 141 Oct 06, 2022
Face uncertainty quantification or estimation using PyTorch.

Face-uncertainty-pytorch This is a demo code of face uncertainty quantification or estimation using PyTorch. The uncertainty of face recognition is af

Kaen 3 Sep 16, 2022
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 369 Dec 30, 2022
Explainable Zero-Shot Topic Extraction

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph This repository contains the code for reproducing the results reported in the paper "Expl

D2K Lab 56 Dec 14, 2022
A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

ClimART - A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models Official PyTorch Implementation Using deep le

21 Dec 31, 2022
Rule-based Customer Segmentation

Rule-based Customer Segmentation Business Problem A game company wants to create level-based new customer definitions (personas) by using some feature

Cem Çaluk 2 Jan 03, 2022
Job Assignment System by Real-time Emotion Detection

Emotion-Detection Job Assignment System by Real-time Emotion Detection Emotion is the essential role of facial expression and it could provide a lot o

1 Feb 08, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus `Chen 2 Jun 09, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
Sample Code for "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL"

Sample Code for "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL" This is the official codebase for Pessimism Meets I

3 Sep 19, 2022
The repo of Feedback Networks, CVPR17

Feedback Networks http://feedbacknet.stanford.edu/ Paper: Feedback Networks, CVPR 2017. Amir R. Zamir*,Te-Lin Wu*, Lin Sun, William B. Shen, Bertram E

Stanford Vision and Learning Lab 87 Nov 19, 2022
3D Avatar Lip Syncronization from speech (JALI based face-rigging)

visemenet-inference Inference Demo of "VisemeNet-tensorflow" VisemeNet is an audio-driven animator centric speech animation driving a JALI or standard

Junhwan Jang 17 Dec 20, 2022
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
Texture mapping with variational auto-encoders

vae-textures This is an experiment with using variational autoencoders (VAEs) to perform mesh parameterization. This was also my first project using J

Alex Nichol 41 May 24, 2022