Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Last update: Jan 05, 2023

Related tags

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

The official implementation of Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

TL;DR Arch-Net is a family of neural networks made up of simple and efficient operators. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. For the classification task, only 30k unlabeled images randomly sampled from ImageNet dataset is needed.

Main Results

ImageNet Classification

Model	Bit Width	Top1	Top5
Arch-Net_Resnet18	32w32a	69.76	89.08
Arch-Net_Resnet18	2w4a	68.77	88.66
Arch-Net_Resnet34	32w32a	73.30	91.42
Arch-Net_Resnet34	2w4a	72.40	91.01
Arch-Net_Resnet50	32w32a	76.13	92.86
Arch-Net_Resnet50	2w4a	74.56	92.39
Arch-Net_MobilenetV1	32w32a	68.79	88.68
Arch-Net_MobilenetV1	2w4a	67.29	88.07
Arch-Net_MobilenetV2	32w32a	71.88	90.29
Arch-Net_MobilenetV2	2w4a	69.09	89.13

Multi30k Machine Translation

Model	translation direction	Bit Width	BLEU
Transformer	English to Gemany	32w32a	32.44
Transformer	English to Gemany	2w4a	33.75
Transformer	English to Gemany	4w4a	34.35
Transformer	English to Gemany	8w8a	36.44
Transformer	Gemany to English	32w32a	30.32
Transformer	Gemany to English	2w4a	32.50
Transformer	Gemany to English	4w4a	34.34
Transformer	Gemany to English	8w8a	34.05

Dependencies

python == 3.6

refer to requirements.txt for more details

Data Preparation

Download ImageNet and multi30k data(google drive or BaiduYun, code: 8brd) and put them in ./arch-net/data/ as follow:

./data/
├── imagenet
│   ├── train
│   ├── val
├── multi30k

Download teacher models at google drive or BaiduYun(code: 57ew) and put them in ./arch-net/models/teacher/pretrained_models/

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

train and evaluate

cd ./train_imagenet

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn

evaluate if you already have the trained models

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn --evaluate

Machine Translation

train a arch-net_transformer of 2w4a

cd ./train_transformer

python3 train_archnet_transformer.py --translate_direction en2de --teacher_model_path ../models/teacher/pretrained_models/transformer_en_de.chkpt --data_pkl ../data/multi30k/m30k_ende_shr.pkl --batch_size 48 --final_epochs 50 --weight_bit 2 --feature_bit 4 --lr 1e-3 --weight_decay 1e-6 --label_smoothing

for arch-net_transformer of 8w8a, use the lr of 1e-3 and the weight decay of 1e-4

evaluate

cd ./evaluate

python3 translate.py --data_pkl ./data/multi30k/m30k_ende_shr.pkl --model path_to_the_outptu_directory/model_max_acc.chkpt

to get the BLEU of the evaluated results, go to this website, and then upload 'predictions.txt' in the output directory and the 'gt_en.txt' or 'gt_de.txt' in ./arch-net/data_gt/multi30k/

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{xu2021archnet,
      title={Arch-Net: Model Distillation for Architecture Agnostic Model Deployment}, 
      author={Weixin Xu and Zipeng Feng and Shuangkang Fang and Song Yuan and Yi Yang and Shuchang Zhou},
      year={2021},
      eprint={2111.01135},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

attention-is-all-you-need-pytorch

LSQuantization

pytorch-mobilenet-v1

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Related tags

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

Main Results

Dependencies

Data Preparation

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

Machine Translation

Citation

Acknowledgements

Contact

Owner

MEGVII Research

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

Search Youtube Video and Get Video info

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Make Watson Assistant send messages to your Discord Server

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Minimalistic PyTorch training loop

Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

A simple but complete full-attention transformer with a set of promising experimental features from various papers

This repo tries to recognize faces in the dataset you created

Official Implementation of VAT