SOTA model in CIFAR10

Last update: Dec 21, 2022

Related tags

Deep Learning pytorch-cifar-tricks

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

Python 3.6+
torch=1.8.0+cu111
torchvision+0.9.0+cu111
tqdm=4.26.0
PyYAML=6.0

1. Implements

1.1 Tricks

Warmup
Cosine LR Decay
SAM
Label Smooth
KD
Adabound
Xavier Kaiming init
lr finder

1.2 Augmentation

Auto Augmentation
Cutout
Mixup
RICAP
Random Erase
ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.82（96.18）	0.158	3.89
WideResNet28-10 +RICAP	2.82（97.18）	0.141	2.85
WideResNet28-10 +Random Erasing	3.18（96.82）	0.114	4.65
WideResNet28-10 +Mixup	3.02（96.98）	0.158	3.02

3.2 Reimplementation结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.78（96.22）		3.89
WideResNet28-10 +RICAP	2.81（97.19）		2.85
WideResNet28-10 +Random Erasing	3.03（96.97）	0.113	4.65
WideResNet28-10 +Mixup	2.93（97.07）	0.158	3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data，epoch200，bs128)

Model	Error rate	Loss
lenet(cpu爆炸)	（70.76）
wideresnet	3.78（96.22）
resnet20	（89.72）
senet	（92.34）
resnet18	（92.08）
resnet34	（92.48）
resnet50	（91.72）
regnet	（92.58）
nasnet	out of mem
shake_resnet26_2x32d	（93.06）
shake_resnet26_2x64d	（94.14）
densenet	（92.06）
dla	（92.58）
googlenet	（91.90）	0.2675
efficientnetb0(利用率低且慢)	（86.82）	0.5024
mobilenet(利用率低)	（89.18）
mobilenetv2	（91.06）
pnasnet	（90.44）
preact_resnet	（90.76）
resnext	（92.30）
vgg(cpugpu利用率都高)	（88.38）
inceptionv3	（91.84）
inceptionv4	（91.10）
inception_resnet_v2	（83.46）
rir	（92.34）	0.3932
squeezenet(CPU利用率高)	（89.16）	0.4311
stochastic_depth_resnet18	（90.22）
xception
dpn	（92.06）	0.3002
ge_resnext29_8x64d	（93.86）	巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度，并进行训练，可以得到以下结论：

结论：lenet这种卷积量比较少，只有两层的，cpu利用率高，gpu利用率低。在这个基础上增加深度，用vgg那种直筒方式增加深度，发现深度越深，cpu利用率越低，gpu利用率越高。

修改训练过程的batch size，可以得到以下结论：

结论：bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	200			96.33
shake_resnet26_2x64d	200	√		96.99
shake_resnet26_2x64d	200		√	96.60
shake_resnet26_2x64d	200	√	√	96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture	epoch	SAM	ASAM	Cosine LR Decay	LabelSmooth	C10 test acc (%)
shake_resnet26_2x64d	200	√				96.51
shake_resnet26_2x64d	200		√			96.80
shake_resnet26_2x64d	200			√		96.61
shake_resnet26_2x64d	200				√	96.57

PS:其他库在加长训练过程（epoch=1800）情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	300			96.66
shake_resnet26_2x64d	300	√		97.21
shake_resnet26_2x64d	300		√	96.90
shake_resnet26_2x64d	300	√	√	96.73

1800 epoch CIFAR ZOO中结果，由于耗时过久，未进行复现。

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	1800			96.94（cifar zoo）
shake_resnet26_2x64d	1800	√		97.20（cifar zoo）
shake_resnet26_2x64d	1800		√	97.42（cifar zoo）
shake_resnet26_2x64d	1800	√	√	97.71（cifar zoo）

3.8 Divide and Co-training方案研究

lr:
- warmup (20 epoch)
- cosine lr decay
- lr=0.1
- total epoch(300 epoch)
bs=128
aug:
- Random Crop and resize
- Random left-right flipping
- AutoAugment
- Normalization
- Random Erasing
- Mixup
weight decay=5e-4 (bias and bn undecayed)
kaiming weight init
optimizer: nesterov

复现：((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture	epoch	cutout	mixup	autoaugment	random-erase	C10 test acc (%)
shake_resnet26_2x64d	200					96.42
shake_resnet26_2x64d	200	√				96.49
shake_resnet26_2x64d	200		√			96.17
shake_resnet26_2x64d	200			√		96.25
shake_resnet26_2x64d	200				√	96.20
shake_resnet26_2x64d	200	√	√			95.82
shake_resnet26_2x64d	200	√		√		96.02
shake_resnet26_2x64d	200	√			√	96.00
shake_resnet26_2x64d	200		√	√		95.83
shake_resnet26_2x64d	200		√		√	95.89
shake_resnet26_2x64d	200			√	√	96.25

python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

SOTA model in CIFAR10

Related tags

Overview

A PyTorch Implementation of CIFAR Tricks

0. Requirements

1. Implements

1.1 Tricks

1.2 Augmentation

2. Training

2.1 CIFAR-10训练示例

3. Results

3.1 原pytorch-ricap的结果

3.2 Reimplementation结果

3.3 Half data快速训练验证各网络结构

3.4 测试cpu gpu影响

3.5 StepLR优化下测试cutout和mixup

3.6 测试SAM,ASAM,Cosine,LabelSmooth

3.7 测试cosine lr + shake

3.8 Divide and Co-training方案研究

3.9 测试多种数据增强

4. Reference

Owner

PJDong

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

Multivariate Time Series Transformer, public version

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Haze Removal can remove slight to extreme cases of haze affecting an image

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

Keeper for Ricochet Protocol, implemented with Apache Airflow

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Pose estimation with MoveNet Lightning

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

Self-Supervised Deep Blind Video Super-Resolution

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

This project implements "virtual speed" from heart rate monito

The first dataset on shadow generation for the foreground object in real-world scenes.

Redash reset for python

Locationinfo - A script helps the user to show network information such as ip address

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

An introduction to bioimage analysis - http://bioimagebook.github.io