A pre-trained model with multi-exit transformer architecture.

Last update: Dec 14, 2022

Related tags

Deep Learning ElasticBERT

Overview

ElasticBERT

This repository contains finetuning code and checkpoints for ElasticBERT.

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

Requirements

We recommend using Anaconda for setting up the environment of experiments:

conda create -n elasticbert python=3.8.8
conda activate elasticbert
conda install pytorch==1.8.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt

Pre-trained Models

We provide the pre-trained weights of ElasticBERT-BASE and ElasticBERT-LARGE, which can be directly used in Huggingface-Transformers.

ElasticBERT-BASE: 12 layers, 12 Heads and 768 Hidden Size.
ElasticBERT-LARGE: 24 layers, 16 Heads and 1024 Hidden Size.

The pre-trained weights can be downloaded here.

Model	`MODEL_NAME`
`ElasticBERT-BASE`	fnlp/elasticbert-base
`ElasticBERT-LARGE`	fnlp/elasticbert-large

Downstream task datasets

The GLUE task datasets can be downloaded from the GLUE leaderboard

The ELUE task datasets can be downloaded from the ELUE leaderboard

Finetuning in static usage

We provide the finetuning code for both GLUE tasks and ELUE tasks in static usage on ElasticBERT.

For GLUE:

cd finetune-static
bash finetune_glue.sh

For ELUE:

cd finetune-static
bash finetune_elue.sh

Finetuning in dynamic usage

We provide finetuning code to apply two kind of early exiting methods on ElasticBERT.

For early exit using entropy criterion:

cd finetune-dynamic
bash finetune_elue_entropy.sh

For early exit using patience criterion:

cd finetune-dynamic
bash finetune_elue_patience.sh

Please see our paper for more details!

Contact

If you have any problems, raise an issue or contact Xiangyang Liu

Citation

If you find this repo helpful, we'd appreciate it a lot if you can cite the corresponding paper:

@article{liu2021elasticbert,
  author    = {Xiangyang Liu and
               Tianxiang Sun and
               Junliang He and
               Lingling Wu and
               Xinyu Zhang and
               Hao Jiang and
               Zhao Cao and
               Xuanjing Huang and
               Xipeng Qiu},
  title     = {Towards Efficient {NLP:} {A} Standard Evaluation and {A} Strong Baseline},
  journal   = {CoRR},
  volume    = {abs/2110.07038},
  year      = {2021},
  url       = {https://arxiv.org/abs/2110.07038},
  eprinttype = {arXiv},
  eprint    = {2110.07038},
  timestamp = {Fri, 22 Oct 2021 13:33:09 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2110-07038.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

A pre-trained model with multi-exit transformer architecture.

Related tags

Overview

ElasticBERT

Requirements

Pre-trained Models

Downstream task datasets

Finetuning in static usage

Finetuning in dynamic usage

Contact

Citation

Owner

fastNLP

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

A strongly-typed genetic programming framework for Python

Misc YOLOL scripts for use in the Starbase space sandbox videogame

Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

State of the Art Neural Networks for Generative Deep Learning

Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Rethinking Nearest Neighbors for Visual Classification

A generator of point clouds dataset for PyPipes.

Changing the Mind of Transformers for Topically-Controllable Language Generation

Film review classification

A crossplatform menu bar application using mpv as DLNA Media Renderer.

SAMO: Streaming Architecture Mapping Optimisation

A pre-trained model with multi-exit transformer architecture.

Related tags

Overview

ElasticBERT

Requirements

Pre-trained Models

Downstream task datasets

Finetuning in static usage

Finetuning in dynamic usage

Contact

Citation

Owner

fastNLP

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

A strongly-typed genetic programming framework for Python

Misc YOLOL scripts for use in the Starbase space sandbox videogame

Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

State of the Art Neural Networks for Generative Deep Learning

Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Rethinking Nearest Neighbors for Visual Classification

A generator of point clouds dataset for PyPipes.

Changing the Mind of Transformers for Topically-Controllable Language Generation

Film review classification

A crossplatform menu bar application using mpv as DLNA Media Renderer.

SAMO: Streaming Architecture Mapping Optimisation

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.