Implementation of "A MLP-like Architecture for Dense Prediction"

Last update: Dec 27, 2022

Related tags

Deep Learning CycleMLP

Overview

A MLP-like Architecture for Dense Prediction (arXiv)

Updates

(22/07/2021) Initial release.

Model Zoo

We provide CycleMLP models pretrained on ImageNet 2012.

Model	Parameters	FLOPs	Top 1 Acc.	Download
CycleMLP-B1	15M	2.1G	78.9%	model
CycleMLP-B2	27M	3.9G	81.6%	model
CycleMLP-B3	38M	6.9G	82.4%	model
CycleMLP-B4	52M	10.1G	83.0%	model
CycleMLP-B5	76M	12.3G	83.2%	model

Usage

Install

PyTorch 1.7.0+ and torchvision 0.8.1+
timm:

pip install 'git+https://github.com/rwightman/[email protected]'

or

git clone https://github.com/rwightman/pytorch-image-models
cd pytorch-image-models
git checkout c2ba229d995c33aaaf20e00a5686b4dc857044be
pip install -e .

fvcore (optional, for FLOPs calculation)
mmcv, mmdetection, mmsegmentation (optional)

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Evaluation

To evaluate a pre-trained CycleMLP-B5 on ImageNet val with a single GPU run:

python main.py --eval --model CycleMLP_B5 --resume path/to/CycleMLP_B5.pth --data-path /path/to/imagenet

Training

To train CycleMLP-B5 on ImageNet on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model CycleMLP_B5 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

Acknowledgement

This code is based on DeiT and pytorch-image-models. Thanks for their wonderful works

Citing

@article{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction},
  author={Chen, Shoufa and Xie, Enze and Ge, Chongjian and Liang, Ding and Luo, Ping},
  journal={arXiv preprint arXiv:2107.10224},
  year={2021}
}

License

CycleMLP is released under MIT License.

Comments

detection result

Applying PVT detection framework, I tried a CycleMLP-B1 based detector with RetinaNet 1x. I got AP=27.1, fairly inferior to the reported 38.6. Could you give some advices to reproduce the reported result?

The specific configure is as follows

base = [ 'base/models/retinanet_r50_fpn.py', 'base/datasets/coco_detection.py', 'base/schedules/schedule_1x.py', 'base/default_runtime.py' ] #optimizer model = dict( pretrained='./pretrained/CycleMLP_B1.pth', backbone=dict( type='CycleMLP_B1_feat', style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5)) #optimizer optimizer = dict(delete=True, type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None)

find_unused_parameters = True

opened by mountain111 6
Compiling CycleMLP

Thank you for this great repo and interesting paper.

I tried compiling CycleMLP to onnx and not surpassingly the process failed since CycleMLP include dynamic offset creation in https://github.com/ShoufaChen/CycleMLP/blob/main/cycle_mlp.py#L132 and as such cannot be converted to a frozen graph. Were you able to convert CycleMLP to onnx or any other frozen graph framework?

Thanks in advance.

opened by shairoz-deci 6
Questions about offset calculation

Hi, thanks for your wonderful work.

I'm currently studying your work, and come up with some question about the offset calculations.

I understood the offset calculation mentioned on the paper, but can't understand about how generated offset is being used in the code.

For ex) if $S_H \times S_W : 3 \times 1$; I understood how the offset is applied in this figure

by calculate like this:

However, when I run the offset generating code, I can't figure out how this offset is being used in deform_conv2d

Can you provide more detailed information about this??

And also, the paper contains how $S_H \times S_W: 3 \times 3$ works, but in the code, it seems like either one ofkernel_size[0] or kernel_size[1] has to be 1. So, if I want to use $S_H \times S_W : 3 \times 3$, do I have to make $3 \times 1$ and $1 \times 3$ offsets and add those together?

Thank you again for your work. I really learned a lot.

opened by tae-mo 5
Example of CycleMLP Configuration for Dense Prediction

Hello.

First of all, thank you for curating this interesting work. I was wondering, are there any working examples of how I can use CycleMLP for dense prediction while maintaining the original input size (e.g., predict a 0 or 1 value for each pixel in an input image)? In addition, I am interested in only a single ("annotated") output image, although I noticed the model definitions given in this repository output multiple downsampled versions of the original input image. Any thoughts on this?

Thank you in advance for your time.

opened by amorehead 2
Swin-B vs CycleMLP-B on image classification

For classificaion on ImageNet-1k, the acuracy of Swin-B is 83.5, which is 0.1 higher than the proposed CycleMLP-B. But, in this paper, the authors reprot that the accuracy of Swin-B is 83.3, which is 0.1 lower than the proposed CycleMLP-B. Why are these accuracies different?

opened by hkzhang91 1

question about the offset

Thanks for your work!

The implementation of this code inspired me. But the calculation of offset here is confusing. Although this issue (https://github.com/ShoufaChen/CycleMLP/issues/10) has asked similar questions, I haven't found a reasonable explanation.

https://github.com/ShoufaChen/CycleMLP/blob/2f76a1f6e3cc6672143fdac46e3db5f9a7341253/cycle_mlp.py#L127-L136

kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
offset.reshape(num_channels, 2)

tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])

the results are different with the figure in paper:

Some codes for verification:

import torch
from torchvision.ops import deform_conv2d

num_channels = 6

data = torch.arange(1, 6).reshape(1, 1, 1, 5).expand(-1, num_channels, -1, -1)
data
"""
tensor([[[[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]]]])
"""

weight = torch.eye(num_channels).reshape(num_channels, num_channels, 1, 1)
weight.reshape(num_channels, num_channels)
"""
tensor([[1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 1.]])
"""

offset = torch.empty(1, 2 * num_channels * 1 * 1, 1, 1)
kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (
        (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
    )
offset.reshape(num_channels, 2)
"""
tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])
"""

deform_conv2d(
    data.float(), 
    offset=offset.expand(-1, -1, -1, 5).float(), 
    weight=weight.float(), 
    bias=None,
)
"""
tensor([[[[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]],
         [[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]]]])
"""

opened by lartpang 1

question about the offset

Hi, thank you very much for your excellent work. In Fig.4 of your paper, you show the pseudo-kernel when kernel size is 1x3. But I when I find that function "gen_offset" does not generate the same offset as Fig.4. The offset it generates is "0,1,0,-1,0,0,0,1..." instead of "0,1,0,-1,0,1,0,-1', which is shown in Fig.4. So could you please tell me the reason?

opened by linjing7 1
About "crop_pct"

Hi, thanks for your great work and code. I wonder the parameter crop_pct actually works in which part of code. When I go throught the timm, I can't find out how this crop_pct is loaded.

opened by ggjy 1
How to deploy CycleMLP-T for training？

Thank you very much for such a wonderful work!

After learning the cycle_mlp source code in the repository, I am very confused to deploy CycleMLP Block based on Swin Transformer. Is it convenient for you to release swin-based CycleMLP? Looking forward to your reply, Thanks!

opened by Pak287 0

Releases(v0.1)

v0.1(Jul 21, 2021)

Source code(tar.gz)
Source code(zip)
CycleMLP_B1.pth(57.92 MB)
CycleMLP_B2.pth(102.45 MB)
CycleMLP_B3.pth(146.75 MB)
CycleMLP_B4.pth(198.05 MB)
CycleMLP_B5.pth(289.38 MB)
CycleMLP_base.pth(335.13 MB)
CycleMLP_small.pth(189.52 MB)
CycleMLP_tiny.pth(108.05 MB)

Owner

Shoufa Chen

GitHub Repository

A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

14 Nov 21, 2022

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

21 Aug 23, 2022

A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021)

GDN A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021) Abstract In this paper, we consider an inverse problem i

4 Sep 13, 2022

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

38 Jan 06, 2023

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

182 Dec 29, 2022

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

41 Dec 10, 2022

Processed, version controlled history of Minecraft's generated data and assets

mcmeta Processed, version controlled history of Minecraft's generated data and assets Repository structure Each of the following branches has a commit

75 Dec 28, 2022

Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 🎉️ 📜 Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022

MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

48 Dec 20, 2022

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning"

A Unified Framework for Parameter-Efficient Transfer Learning This is the official implementation of the paper: Towards a Unified View of Parameter-Ef

216 Dec 29, 2022

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

sound-to-light Python project to take sound as input and output as RGB + Brightness values suitable for DMX Current goals: Get one pixel working: Vary

1 Nov 17, 2021

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

88 Jan 04, 2023

Learning from Synthetic Humans, CVPR 2017

Learning from Synthetic Humans (SURREAL) Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev and Cordelia Schmid,

538 Dec 18, 2022

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain

32 Oct 24, 2022

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning This repository contains the code and relevant instructions

5 Aug 19, 2022

A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

83 Nov 28, 2022

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

pytorch-AdaIN This is an unofficial pytorch implementation of a paper, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Hua

873 Jan 06, 2023

Visual dialog agents with pre-trained vision-and-language encoders.

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Or READ-UP: Referring Expression Agent Dialog with Unified Pretr

7 Oct 08, 2022

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

35 Sep 23, 2022

The pure and clear PyTorch Distributed Training Framework.

The pure and clear PyTorch Distributed Training Framework. Introduction Requirements and Usage Dependency Dataset Basic Usage Slurm Cluster Usage Base

208 Dec 20, 2022