Transformer training code for sequential tasks

Overview

Sequential Transformer

This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer architecture, it uses caching of previous representations and relative position embeddings to better adapt to sequential tasks. In addition, the code also implements the following projects as described below and in this blog post:

Requirements

You need PyTorch 0.4.1 or above and a cuda-enabled GPU to run the code. If there are multiple GPUs available, the code uses nn.DataParallel to utilize them. For better efficiency, enable distributed training by --distributed argument, which can run on multiple nodes.

Adaptive Attention Span

This code can be used for running experiments in Adaptive Attention Span for Transformers paper. The adaptive span allows a model to learn an optimal context size for each self-attention head from training data. As shown in the below figure, only few heads require long attention span, thus making it possible to increase the context size to 8k tokens without increasing computation time and memory footprint significantly.

An argument --adapt-span enables adaptive span. Otherwise a model will have a fixed attention span. The adaptive-span is implemented as a nn.Module to make it easier to plug it into other models.

Running experiments in the paper

Scripts for running experiments in the paper are located in ./experiments/ directory. For example, a smaller 8-layer version of our model can be trained on a single GPU by running:

bash experiments/enwik8_small.sh

It should reach about 1.3bpc on dev after 150k steps.

For training larger models, multiple GPUs are recommended. In the script files, you can configure the number of available GPUs. Increase the --batch-split argument if you run out of GPU memory (it splits batches into smaller pieces without changing the final result).

We obtained the following results in our experiments:

Experiment #params dev test
enwik8 38M 1.04 bpb 1.02 bpb
enwik8_large 209M 1.00 bpb 0.98 bpb
text8 39M 1.05 bpc 1.11 bpc
text8_large 209M 1.01 bpc 1.07 bpc

A large model training takes about 1.2sec/batch near the end (initially it's faster because the attention spans are smaller) on 8 V100 GPUs. So, for example, the whole enwik8_large training of 170k steps should take less than 2.4 days.

Pre-trained models

You can download pre-trained models by running the get_pretrained.sh script. Then the same scripts in ./experiments/ can be used to evaluate those models. Since the download script puts models in ./checkpoints/, make sure there is no file with the same name. Note that these pre-trained models are obtained by rerunning the training scripts after the code cleanup, so there are small differences from the above results due to the randomness of the training.

All-attention Network

The code also can be used for training All-attention Networks introduced in Augmenting Self-attention with Persistent Memory. If --pers-mem-size argument is set to N, all FF sublayers will be removed from the model and N persistent memory vectors will be added to every self-attention sublayer. The following experiments can be found in ./experiments/ directory.

Experiment #params dev test
enwik8_pers_small.sh 39M 1.03 bpb 1.01 bpb
enwik8_pers.sh 114M 1.00 bpb 0.98 bpb
wiki103_pers.sh 133M 18.8 ppl * 19.7 ppl *

(*This number is slightly better than the paper because it includes end-of-line as a token.)

License

The code is licensed under CC-BY-NC license. See the LICENSE file for more details.

Acknowledgement

We thank Xavier Martinet for helping with cleaning the code. The data preprocessing scripts are downloaded from awd-lstm and transformer-XL repos. The adagrad_with_grad_clip.py is mostly adapted from PyTorch.

Owner
Meta Research
Meta Research
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus CVSS is a massively multilingual-to-English speech-to-speech translation corpus, co

Google Research Datasets 118 Jan 06, 2023
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
NLP - Machine learning

Flipkart-product-reviews NLP - Machine learning About Product reviews is an essential part of an online store like Flipkart’s branding and marketing.

Harshith VH 1 Oct 29, 2021
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

This Repository contains a sample code for Tacotron 2, WaveGlow with multi-speaker, emotion embeddings together with a script for data preprocessing.

Ivan Didur 106 Jan 01, 2023
A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

LineFlow: Framework-Agnostic NLP Data Loader in Python LineFlow is a simple text dataset loader for NLP deep learning tasks. LineFlow was designed to

TofuNLP 177 Jan 04, 2023
Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

Dirk Neuhäuser 4 Apr 06, 2022
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
A programming language with logic of Python, and syntax of all languages.

Pytov The idea was to take all well known syntaxes, and combine them into one programming language with many posabilities. Installation Install using

Yuval Rosen 14 Dec 07, 2022
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

NLP*CL Laboratory 2 Oct 26, 2021
This is the offline-training-pipeline for our project.

offline-training-pipeline This is the offline-training-pipeline for our project. We adopt the offline training and online prediction Machine Learning

0 Apr 22, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
COVID-19 Related NLP Papers

COVID-19 outbreak has become a global pandemic. NLP researchers are fighting the epidemic in their own way.

xcfeng 28 Oct 30, 2022
📔️ Generate a text-based journal from a template file.

JGen 📔️ Generate a text-based journal from a template file. Contents Getting Started Example Overview Usage Details Reserved Keywords Gotchas Getting

Harrison Broadbent 21 Sep 25, 2022
🗣️ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

Gustavo Rosa 21 Aug 12, 2022
Sequence-to-Sequence Framework in PyTorch

nmtpytorch allows training of various end-to-end neural architectures including but not limited to neural machine translation, image captioning and au

LIUM 395 Nov 21, 2022
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Scott Mudge 97 Nov 21, 2022
Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

Luyu Gao 140 Dec 06, 2022
Python library for parsing resumes using natural language processing and machine learning

CVParser Python library for parsing resumes using natural language processing and machine learning. Setup Installation on Linux and Mac OS Follow the

nafiu 0 Jul 29, 2021
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

LipGAN Generate realistic talking faces for any human speech and face identity. [Paper] | [Project Page] | [Demonstration Video] Important Update: A n

Rudrabha Mukhopadhyay 438 Dec 31, 2022