Code repo for "Transformer on a Diet" paper

Last update: Sep 26, 2021

Related tags

Overview

Transformer on a Diet

Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020).

Installation

pip install --pre --upgrade mxnet
pip install gluonnlp

Results

The results and the command line to reproduce the results on PTB dataset are as follows.

[1] Full (Val PPL 109.19 Test PPL 103.72)

$ cd scripts/language_model/
$ python transformer_language_model.py --model full --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[2] Dilated (Val PPL 115.67 Test PPL 110.92)

$ cd scripts/language_model/
$ python transformer_language_model.py --model dilated --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[3] Dilated-Memory (Val PPL 115.35 Test PPL 110.98)

$ cd scripts/language_model/
$ python transformer_language_model.py --model dilated_mem --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[4] Cascade (Val PPL 109.16 Test PPL 105.27)

$ cd scripts/language_model/
$ python transformer_language_model.py --model cascade --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 4 --window_size_multiplier 2 --kernel_size 3 --d_base 2

Note that the command to reproduce the results on wikitext-2 would be updated soon.

Reference Paper

The bibtext entry of the reference paper is:

@article{transformerdiet2020,
   title={Transformer on a Diet},
   author={Chenguang Wang and Zihao Ye and Aston Zhang and Zheng Zhang and Alexander J. Smola},
   journal={ArXiv},
   year={2020},
   volume={abs/2002.06170}
}

Code repo for "Transformer on a Diet" paper

Related tags

Overview

Transformer on a Diet

Installation

Results

Reference Paper

Owner

cgraywang

Implementation for Shape from Polarization for Complex Scenes in the Wild

FinEAS: Financial Embedding Analysis of Sentiment 📈

alfred-py: A deep learning utility library for human

A Dataset for Direct Quotation Extraction and Attribution in News Articles.

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

An off-line judger supporting distributed problem repositories

Database Reasoning Over Text project for ACL paper

Learning Super-Features for Image Retrieval

A system for quickly generating training data with weak supervision

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Marine debris detection with commercial satellite imagery and deep learning.

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

Pomodoro timer that acknowledges the inexorable, infinite passage of time

Experiments with differentiable stacks and queues in PyTorch

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

Code repo for "Transformer on a Diet" paper

Related tags

Overview

Transformer on a Diet

Installation

Results

Reference Paper

Owner

cgraywang

Implementation for Shape from Polarization for Complex Scenes in the Wild

FinEAS: Financial Embedding Analysis of Sentiment 📈

alfred-py: A deep learning utility library for **human**

A Dataset for Direct Quotation Extraction and Attribution in News Articles.

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

An off-line judger supporting distributed problem repositories

Database Reasoning Over Text project for ACL paper

Learning Super-Features for Image Retrieval

A system for quickly generating training data with weak supervision

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Marine debris detection with commercial satellite imagery and deep learning.

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

Pomodoro timer that acknowledges the inexorable, infinite passage of time

Experiments with differentiable stacks and queues in PyTorch

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

alfred-py: A deep learning utility library for human