[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net
_{Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)}

OpenReview | arXiv | PDF | Model Zoo | BibTex

PyTorch implementation of neural network quantization with fixed-point 8-bit only multiplication.

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Qing Jin^1,2, Jian Ren¹, Richard Zhuang¹, Sumant Hanumante¹, Zhengang Li², Zhiyu Chen³, Yanzhi Wang², Kaiyuan Yang³, Sergey Tulyakov¹
¹Snap Inc., ²Northeastern University, ³Rice University
ICLR 2022 Oral.

Overview

Neural network quantization implements efficient inference via reducing the weight and input precisions. Previous methods for quantization can be categorized as simulated quantization, integer-only quantization, and fixed-point quantization, with the former two involving high-precision multiplications with 32-bit floating-point or integer scaling. In contrast, fixed-point models can avoid such high-demanding requirements but demonstrates inferior performance to the other two methods. In this work, we study the problem of how to train such models. Specifically, we conduct statistical analysis on values for quantization and propose to determine the fixed-point format from data during training with some semi-empirical formula. Our method demonstrates that high-precision multiplication is not necessary for the quantized model to achieve comparable performance as their full-precision counterparts.

Getting Started

Requirements

Please check the requirements and download packages.
Prepare ImageNet-1k data following pytorch example, and create a softlink to the ImageNet data path to data under current the code directory (ln -s /path/to/imagenet data).

Model Training

Conventional training
We train the model with the file distributed_run.sh and the command
bash distributed_run.sh /path/to/yml_file batch_size
We set batch_size=2048 for conventional training of floating-/fixed-point ResNet18 and MobileNet V1/V2.

Before training, please update the dataset_dir and log_dir arguments in the yaml files for training the floating-/fixed-point models.

To train the floating-point model, please use the yaml file ***_floating_train.yml in the conventional subfolder under the corresponding folder of the model.

To train the fixed-point model, please first train the floating-point model as the initialization. Please use the yaml file ***_fix_quant_train.yml in the conventional subfolder under the corresponding folder of the model. Please make sure the argument fp_pretrained_file directs to the correct path for the corresponding floating-point checkpoint. We also provide our pretrained floating-point models in the Model Zoo below.
Tiny finetuning
We finetune the model with the file run.sh and the command
bash run.sh /path/to/yml_file batch_size
We set batch_size=128 and use one GPU for tiny-finetuning of fixed-point ResNet18/50.

Before fine-tuning, please update the dataset_dir and log_dir arguments in the yaml files for finetuning the fixed-point models.

To finetune the fixed-point model, please use the yaml file ***_fix_quant_***_pretrained_train.yml in the tiny_finetuning subfolder under the corresponding folder of the model. For model pretrained with PytorchCV (Baseline of ResNet18 and Baseline#1 of ResNet50), the floating-point checkpoint will be downloaded automatically during code running. For the model pretrained by Nvidia (Baseline#2 of ResNet50), please download the checkpoint first and make sure the argument nvidia_pretrained_file directs to the correct path of this checkpoint.

Model Testing

We test the model with the file run.sh and the command
```
bash run.sh /path/to/yml_file batch_size
```
We set batch_size=128 and use one GPU for model testing.
Before testing, please update the dataset_dir and log_dir arguments in the yaml files. Please update the argument integize_file_path and int_op_only_file_path arguments in the yaml files ***_fix_quant_test***_integize.yml and ***_fix_quant_test***_int_op_only.yml, respectively. Please also update other arguments like nvidia_pretrained_file if necessary (even if they are not used during testing).
We use the yaml file ***_floating_test.yml for testing the floating-point model; ***_fix_quant***_test.yml for testing the fixed-point model with the same setting as during training/tiny-finetuning; ***_fix_quant***_test_int_model.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and use the original modules during training (e.g. with batch normalization layers); ***_fix_quant***_test_integize.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and a new equivalent model with only convolution, pooling and fully-connected layers; ***_fix_quant***_test_int_op_only.yml for testing the fixed-point model on CPU with all quantized weights, bias and inputs implemented with integers (with int dtype) and a new equivalent model with only convolution, pooling and fully-connected layers. Note that the accuracy from the four testing files can differ a little due to numerical error.

Model Export

We export fixed-point model with integer weights, bias and inputs to run on GPU and CPU during model testing with ***_fix_quant_test_integize.yml and ***_fix_quant_test_int_op_only.yml files, respectively.
The exported onnx files are saved to the path given by the arguments integize_file_path and int_op_only_file_path.

F8Net Model Zoo

All checkpoints and onnx files are available at here.

Conventional

Model	Type	Top-1 Acc.^a	Checkpoint
ResNet18	FP 8-bit	70.3 71.0	`Res18_32` `Res18_8`
MobileNet-V1	FP 8-bit	72.4 72.8	`MBV1_32` `MBV1_8`
MobileNet-V2b	FP 8-bit	72.7 72.6	`MBV2b_32` `MBV2b_8`

Tiny Finetuning

Model	Type	Top-1 Acc.^a	Checkpoint
ResNet18	FP 8-bit	73.1 72.3	`Res18_32p` `Res18_8p`
ResNet50b (BL#1)	FP 8-bit	77.6 77.6	`Res50b_32p` `Res50b_8p`
ResNet50b (BL#2)	FP 8-bit	78.5 78.1	`Res50b_32n` `Res50b_8n`

^a The accuracies are obtained from the inference step during training. Test accuracy for the final exported model might have some small accuracy difference due to numerical error.

Technical Details

The main techniques for neural network quantization with 8-bit fixed-point multiplication includes the following:

Quantized methods/modules including determining fixed-point formats from statistics or by grid-search, fusing convolution and batch normalization layers, and reformulating PACT with fixed-point quantization are implemented in models/fix_quant_ops.
Clipping-level sharing and private fractional length for residual blocks are implemented in the ResNet (models/fix_resnet) and MobileNet V2 (models/fix_mobilenet_v2).

Acknowledgement

This repo is based on AdaBits.

Citation

If our code or models help your work, please cite our paper:

@inproceedings{
  jin2022fnet,
  title={F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization},
  author={Qing Jin and Jian Ren and Richard Zhuang and Sumant Hanumante and Zhengang Li and Zhiyu Chen and Yanzhi Wang and Kaiyuan Yang and Sergey Tulyakov},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=_CfpJazzXT2}
}

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Related tags

Overview

F8Net
_{Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)}

Getting Started

F8Net Model Zoo

Technical Details

Acknowledgement

Citation

Owner

Snap Research

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Real-time object detection on Android using the YOLO network with TensorFlow

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

When BERT Plays the Lottery, All Tickets Are Winning

GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

My personal code and solution to the Synacor Challenge from 2012 OSCON.

Iranian Cars Detection using Yolov5s, PyTorch

A program that uses computer vision to detect hand gestures, used for controlling movie players.

A PyTorch implementation of QANet.

DeepStochlog Package For Python

[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

Controlling the MicriSpotAI robot from scratch

SPEAR: Semi suPErvised dAta progRamming

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Related tags

Overview

F8NetFixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)

Getting Started

F8Net Model Zoo

Technical Details

Acknowledgement

Citation

Owner

Snap Research

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Real-time object detection on Android using the YOLO network with TensorFlow

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

When BERT Plays the Lottery, All Tickets Are Winning

GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

My personal code and solution to the Synacor Challenge from 2012 OSCON.

Iranian Cars Detection using Yolov5s, PyTorch

A program that uses computer vision to detect hand gestures, used for controlling movie players.

A PyTorch implementation of QANet.

DeepStochlog Package For Python

[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

Controlling the MicriSpotAI robot from scratch

SPEAR: Semi suPErvised dAta progRamming

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

F8Net
_{Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)}