ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Overview

ElasticBERT

This repository contains finetuning code and checkpoints for ElasticBERT.

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

Requirements

We recommend using Anaconda for setting up the environment of experiments:

conda create -n elasticbert python=3.8.8
conda activate elasticbert
conda install pytorch==1.8.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt

Pre-trained Models

We provide the pre-trained weights of ElasticBERT-BASE and ElasticBERT-LARGE, which can be directly used in Huggingface-Transformers.

  • ElasticBERT-BASE: 12 layers, 12 Heads and 768 Hidden Size.
  • ElasticBERT-LARGE: 24 layers, 16 Heads and 1024 Hidden Size.

The pre-trained weights can be downloaded here.

Model MODEL_NAME
ElasticBERT-BASE fnlp/elasticbert-base
ElasticBERT-LARGE fnlp/elasticbert-large

Downstream task datasets

The GLUE task datasets can be downloaded from the GLUE leaderboard

The ELUE task datasets can be downloaded from the ELUE leaderboard

Finetuning in static usage

We provide the finetuning code for both GLUE tasks and ELUE tasks in static usage on ElasticBERT.

For GLUE:

cd finetune-static
bash finetune_glue.sh

For ELUE:

cd finetune-static
bash finetune_elue.sh

Finetuning in dynamic usage

We provide finetuning code to apply two kind of early exiting methods on ElasticBERT.

For early exit using entropy criterion:

cd finetune-dynamic
bash finetune_elue_entropy.sh

For early exit using patience criterion:

cd finetune-dynamic
bash finetune_elue_patience.sh

Please see our paper for more details!

Contact

If you have any problems, raise an issue or contact Xiangyang Liu

Citation

If you find this repo helpful, we'd appreciate it a lot if you can cite the corresponding paper:

@article{liu2021elasticbert,
  author    = {Xiangyang Liu and
               Tianxiang Sun and
               Junliang He and
               Lingling Wu and
               Xinyu Zhang and
               Hao Jiang and
               Zhao Cao and
               Xuanjing Huang and
               Xipeng Qiu},
  title     = {Towards Efficient {NLP:} {A} Standard Evaluation and {A} Strong Baseline},
  journal   = {CoRR},
  volume    = {abs/2110.07038},
  year      = {2021},
  url       = {https://arxiv.org/abs/2110.07038},
  eprinttype = {arXiv},
  eprint    = {2110.07038},
  timestamp = {Fri, 22 Oct 2021 13:33:09 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2110-07038.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Owner
fastNLP
由复旦大学的自然语言处理(NLP)团队发起的国产自然语言处理开源项目
fastNLP
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

Matheus Alves 2 Jan 06, 2022
a CTF web challenge about making screenshots

screenshotter (web) A CTF web challenge about making screenshots. It is inspired by a bug found in real life. The challenge was created by @LiveOverfl

219 Jan 02, 2023
Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

Themos Stafylakis 10 Apr 30, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 09, 2022
2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識 程式碼說明 組別:這就是我 成員:蔣明憲 唐碩謙 黃玥菱 林冠霆 蕭靖騰 目錄 環境套件 安裝方式 資料夾布局 前處理-製作偵測訓練註解檔 前處理-製作分類訓練樣本 part.py : 從 json 裁切出分類訓練樣本 Class.py : 將切出來的樣本按照文字分類到各資料夾

HuanyueTW 3 Jan 14, 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [Project Page] [Paper] [Video] Wenlong Huang1, Pieter Abbee

Wenlong Huang 114 Dec 29, 2022
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 736 Jan 03, 2023
Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Zhuosheng Zhang 3 Apr 14, 2022
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in s

Jonas Belouadi 7 Nov 07, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 02, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 09, 2023
Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

circuit10 22 Oct 27, 2021
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

GMFTBY 32 Nov 13, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。 声明 本项目是为方便个人工作所创建的,仅有部分代码原创。

Ming 30 Dec 02, 2022
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Ai-ChatBot-Python A chatbot is an intelligent system which can hold a conversation with a human using natural language in real time. Due to the rise o

1 Oct 30, 2021
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

Babelscape 40 Dec 11, 2022
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022