Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Last update: Jul 22, 2022

Related tags

Overview

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

This is our Pytorch implementation for the paper:

Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Zhi-Yuan Xie, Zhong-Yi Lu and Ji-Rong Wen(2021). Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Introduction

This paper presents a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we propose a novel fine-tuning strategy by only updating the parameters from the auxiliary tensors, and design an optimization algorithm for MPO-based approximation over stacked network architectures. Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned. Extensive experiments have demonstrated the effectiveness of the proposed approach in model compression, especially the reduction in fine-tuning parameters (91% reduction on average).

For more details about the technique of MPOP, refer to our paper

Release Notes

First version: 2021/05/21
add albert code: 2021/06/08

Requirements

python 3.7
torch >= 1.8.0

Installation

pip install mpo_lab

Lightweight fine-tuning

In lightweight fine-tuning, we use original ALBERT without fine-tuning as to be compressed. By performing MPO decomposition on each weight matrix, we obtain four auxiliary tensors and one central tensor per tensor set. This provides a good initialization for the task-specific distillation. Refer to run_all_albert_fine_tune.sh

Important arguments:

--data_dir          Path to load dataset
--mpo_lr            Learning rate of tensors produced by MPO
--mpo_layers        Name of components to be decomposed with MPO
--emb_trunc         Truncation number of the central tensor in word embedding layer
--linear_trunc      Truncation number of the central tensor in linear layer
--attention_trunc   Truncation number of the central tensor in attention layer
--load_layer        Name of components to be loaded from exist checkpoint file
--update_mpo_layer  Name of components to be update when training the model

Dimension squeezing

In Dimension squeezing, we compute approiate truncation order for the whole model. In order to re-produce the results in paper, we prepare the model after lightweight fine-tuning. Refer to run_all_albert_fine_tune.sh

albert models google drive

Acknowledgment

Any scientific publications that use our codes should cite the following paper as the reference:

@inproceedings{Liu-ACL-2021,
  author    = {Peiyu Liu and
               Ze{-}Feng Gao and
               Wayne Xin Zhao and
               Z. Y. Xie and
               Zhong{-}Yi Lu and
               Ji{-}Rong Wen},
  title     = "Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression
               based on Matrix Product Operators",
  booktitle = {{ACL}},
  year      = {2021},
}

TODO

prepare data and code
upload models in order to reproduce experiments
supplementary details for paper

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Related tags

Overview

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Introduction

Release Notes

Requirements

Installation

Lightweight fine-tuning

Dimension squeezing

Acknowledgment

TODO

Owner

RUCAIBox

Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

A small library for creating and manipulating custom JAX Pytree classes

A python program to hack instagram

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Accurate identification of bacteriophages from metagenomic data using Transformer

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Starter kit for getting started in the Music Demixing Challenge.

Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Pseudo-Visual Speech Denoising

Face recognition with trained classifiers for detecting objects using OpenCV

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Sequence-to-Sequence learning using PyTorch

Using PyTorch Perform intent classification using three different models to see which one is better for this task

To prepare an image processing model to classify the type of disaster based on the image dataset

VGG16 model-based classification project about brain tumor detection.

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)