Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Last update: Nov 10, 2022

Overview

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

pip install -r requirements.txt

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the latest dump of Wikipedia from HERE, and extract it into ./dataset/pretrain_data/download_wikipedia/.
Download a mirror of BooksCorpus from HERE, and extract it into ./dataset/pretrain_data/download_bookcorpus/.

- Pre-training data

bash create_pretrain_data.sh
bash create_pretrain_feature.sh

The features of Wikipedia, BooksCorpus, and their concatenation will be saved into ./dataset/pretrain_data/wikipedia_nomask/, ./dataset/pretrain_data/bookcorpus_nomask/, and ./dataset/pretrain_data/wiki_book_nomask/, respectively.

- Fine-tuning data

Download the GLUE dataset using the script in HERE, and put the files into ./dataset/glue/.
Download the SQuAD v1.1 and v2.0 datasets from the following links:

and put them into ./dataset/squad/.

Pre-train the supernet

bash pretrain_supernet.sh

The checkpoints will be saved into ./exp/pretrain/supernet/, and the names of the sub-directories should be modified into stage1_2 and stage3 correspondingly.

We also provide the checkpoint of the supernet in stage 3 (pre-trained with both Wikipedia and BooksCorpus) at HERE.

Train the teacher model (BERT$_{\rm BASE}$)

bash train.sh

The checkpoints will be saved into ./exp/train/bert_base/, and the names of the sub-directories should be modified into the corresponding task name (i.e., mnli, qqp, qnli, sst-2, cola, sts-b, mrpc, rte, wnli, squad1.1, and squad2.0). Each sub-directory contains a checkpoint named best_model.bin.

Conduct NAS (including search stage 1, 2, and 3)

bash ffn_search.sh

The checkpoints will be saved into ./exp/ffn_search/.

Distill the student model

- TinyBERT$_4$, TinyBERT$_6$

bash finetune.sh

The checkpoints will be saved into ./exp/downstream/tiny_bert/.

- EfficientBERT$_{\rm TINY}$, EfficientBERT, EfficientBERT+, EfficientBERT++

bash nas_finetune.sh

The above script will first pre-train the student models based on the pre-trained checkpoint of the supernet in stage 3, and save the pre-trained checkpoints into ./exp/pretrain/auto_bert/. Then fine-tune it on the downstream datasets, and save the fine-tuned checkpoints into ./exp/downstream/auto_bert/.

We also provide the pre-trained checkpoints of the student models (including EfficientBERT$_{\rm TINY}$, EfficientBERT, and EfficientBERT++) at HERE.

- EfficientBERT (TinyBERT$_6$)

bash nas_finetune_transfer.sh

The pre-trained and fine-tuned checkpoints will be saved into ./exp/pretrain/auto_tiny_bert/ and ./exp/downstream/auto_tiny_bert/, respectively.

Test on the GLUE dataset

bash test.sh

The test results will be saved into ./test_results/.

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021efficient-bert,
  title     = {{E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation},
  author    = {Chenhe Dong and Guangrun Wang and Hang Xu and Jiefeng Peng and Xiaozhe Ren and Xiaodan Liang},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  year      = {2021}
}

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Related tags

Overview

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Requirements

Download checkpoints

Prepare dataset

- Pre-training data

- Fine-tuning data

Pre-train the supernet

Train the teacher model (BERT$_{\rm BASE}$)

Conduct NAS (including search stage 1, 2, and 3)

Distill the student model

- TinyBERT$_4$, TinyBERT$_6$

- EfficientBERT$_{\rm TINY}$, EfficientBERT, EfficientBERT+, EfficientBERT++

- EfficientBERT (TinyBERT$_6$)

Test on the GLUE dataset

Reference

Owner

Chenhe Dong

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

Klexikon: A German Dataset for Joint Summarization and Simplification

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

American Sign Language (ASL) to Text Converter

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

AI_Assistant - This is a Python based Voice Assistant.

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

A BERT-based reverse-dictionary of Korean proverbs

Anuvada: Interpretable Models for NLP using PyTorch

Almost State-of-the-art Text Generation library

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.