I-BERT: Integer-only BERT Quantization

Last update: Dec 27, 2022

Overview

I-BERT: Integer-only BERT Quantization

HuggingFace Implementation

I-BERT is also available in the master branch of HuggingFace! Visit the following links for the HuggingFace implementation.

Github Link: https://github.com/huggingface/transformers/tree/master/src/transformers/models/ibert

Model Links:

Installation & Requirements

You can find more detailed installation guides from the Fairseq repo: https://github.com/pytorch/fairseq

1. Fairseq Installation

Reference: Fairseq

PyTorch version >= 1.4.0
Python version >= 3.6
Currently, I-BERT only supports training on GPU

git clone https://github.com/kssteven418/I-BERT.git
cd I-BERT
pip install --editable ./

2. Download pre-trained RoBERTa models

Reference: Fairseq RoBERTa

Download pretrained RoBERTa models from the links and unzip them.

RoBERTa-Base: roberta.base.tar.gz
RoBERTa-Large: roberta.large.tar.gz

# In I-BERT (root) directory
mkdir models && cd models
wget {link}
tar -xvf roberta.{base|large}.tar.gz

3. Download GLUE datasets

Reference: Fairseq Finetuning on GLUE

First, download the data from the GLUE website. Make sure to download the dataset in I-BERT (root) directory.

# In I-BERT (root) directory
wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python download_glue_data.py --data_dir glue_data --tasks all

Then, preprocess the data.

# In I-BERT (root) directory
./examples/roberta/preprocess_GLUE_tasks.sh glue_data {task_name}

task_name can be one of the following: {ALL, QQP, MNLI, QNLI, MRPC, RTE, STS-B, SST-2, CoLA} . ALL will preprocess all the tasks. If the command is run propely, preprocessed datasets will be stored in I-BERT/{task_name}-bin

Now, you have the models and the datasets ready, so you are ready to run I-BERT!

Task-specific Model Finetuning

Before quantizing the model, you first have to finetune the pre-trained models to a specific downstream task. Although you can finetune the model from the original Fairseq repo, we provide ibert-base branch where you can train non-quantized models without having to install the original Fairseq. This branch is identical to the master branch of the original Fairseq repo, except for some loggings and run scripts that are irrelevant to the functionality. If you already have finetuned models, you can skip this part.

Run the following commands to fetch and move to the ibert-base branch:

# In I-BERT (root) directory
git fetch
git checkout -t origin/ibert-base

Then, run the script:

# In I-BERT (root) directory
# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name}
CUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC

Checkpoints and validation logs will be stored at ./outputs directory. You can change this output location by adding the option --output-dir OUTPUT_DIR. The exact output location will look something like: ./outputs/none/MRPC-base/wd0.1_ad0.1_d0.1_lr2e-5/1219-101427_ckpt/checkpoint_best.pt. By default, models are trained according to the task-specific hyperparameters specified in Fairseq Finetuning on GLUE. However, you can also specify the hyperparameters with the options (use the option -h for more details).

Quantiation & Quantization-Aware-Finetuning

Now, we come back to ibert branch for quantization.

git checkout ibert

And then run the script. This will first quantize the model and do quantization-aware-finetuning with the learning rate that you specify with the option --lr {lr}.

# In I-BERT (root) directory
# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name} \
# --restore-file {ckpt_path} --lr {lr}
CUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC --restore-file ckpt-best.pt --lr 1e-6

NOTE: Our work is still on progress. Currently, all integer operations are executed with floating point.

I-BERT: Integer-only BERT Quantization

Related tags

Overview

I-BERT: Integer-only BERT Quantization

HuggingFace Implementation

Installation & Requirements

Task-specific Model Finetuning

Quantiation & Quantization-Aware-Finetuning

Owner

Sehoon Kim

CARL provides highly configurable contextual extensions to several well-known RL environments.

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Keras-retinanet - Keras implementation of RetinaNet object detection.

State-of-the-art data augmentation search algorithms in PyTorch

MoveNetを用いたPythonでの姿勢推定のデモ

The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

MoveNet Single Pose on OpenVINO

Package for working with hypernetworks in PyTorch.

Revisiting Self-Training for Few-Shot Learning of Language Model.

A bare-bones Python library for quality diversity optimization.

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Pytorch Lightning Distributed Accelerators using Ray

Python parser for DTED data.

Randomizes the warps in a stock pokeemerald repo.

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

Gif-caption - A straightforward GIF Captioner written in Python

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

I-BERT: Integer-only BERT Quantization

Related tags

Overview

I-BERT: Integer-only BERT Quantization

HuggingFace Implementation

Installation & Requirements

Task-specific Model Finetuning

Quantiation & Quantization-Aware-Finetuning

Owner

Sehoon Kim

CARL provides highly configurable contextual extensions to several well-known RL environments.

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Keras-retinanet - Keras implementation of RetinaNet object detection.

State-of-the-art data augmentation search algorithms in PyTorch

MoveNetを用いたPythonでの姿勢推定のデモ

The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

MoveNet Single Pose on OpenVINO

Package for working with hypernetworks in PyTorch.

Revisiting Self-Training for Few-Shot Learning of Language Model.

A bare-bones Python library for quality diversity optimization.

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Pytorch Lightning Distributed Accelerators using Ray

Python parser for DTED data.

Randomizes the warps in a stock pokeemerald repo.

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

Gif-caption - A straightforward GIF Captioner written in Python

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务