The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Last update: Dec 07, 2022

Related tags

Deep Learning YOCO-BERT

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

@misc{zhang2021compress,
      title={You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient}, 
      author={Shaokun Zhang and Xiawu Zheng and Chenyi Yang and Yuchao Li and Yan Wang and Fei Chao and Mengdi Wang and Shen Li and Jun Yang and Rongrong Ji},
      year={2021},
      eprint={2106.02435},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      }

Overview

This repository is the official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

📋 We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere. Compared with state of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving superior average accuracy improvement on the GLUE.

Requirements

Python > 3.6
Pytorch = 1.7.0
transformers = 3.5.0

Training

To train the super-BERTs in the paper, run this command:

python train_superbert.py --cfg /path_to_superbert_training_config/config.yaml

Searching

To search the optimal sub-BERTs given any constraints in the paper, run this command:

python search_subbert.py --cfg /path_to_subbert_searching_config/config.yaml

Evaluation

The evaluation results will be reported after the searching process.

Config

We release all the traning and searching configs in config

Results

Our model achieves the following performance on :

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Datasets	SST-2	MRPC	CoLA	RTE	MNLI	QQP	QNLI
Results	92.8	90.3	59.8	72.9	82.6	90.5	87.2

📋 The detailed metrics used in this code are reported in the paper.

Licence

This repository is released under the MIT license. See LICENSE for more information.

Contact

Any problem regarding this code re-implementation, feel free to contact the first author: [email protected]

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Related tags

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

Overview

Requirements

Training

Searching

Evaluation

Config

Results

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Licence

Contact

Owner

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Fuzzing the Kernel Using Unicornafl and AFL++

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Open CV - Convert a picture to look like a cartoon sketch in python

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

UMPNet: Universal Manipulation Policy Network for Articulated Objects

Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

This is the source code for the experiments related to the paper Unsupervised Audio Source Separation Using Differentiable Parametric Source Models

GAN-generated image detection based on CNNs

PyTorch implementation of "VRT: A Video Restoration Transformer"

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation