Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

An ActivityWatch watcher to pose questions to the user and record her answers.

Resources for "Natural Language Processing" Coursera course.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

Simple and efficient RevNet-Library with DeepSpeed support

Conditional probing: measuring usable information beyond a baseline

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Main repository for the chatbot Bobotinho.

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

Ukrainian TTS (text-to-speech) using Coqui TTS

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Transformers implementation for Fall 2021 Clinic

VD-BERT: A Unified Vision and Dialog Transformer with BERT

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

Open-World Entity Segmentation

Utilize Korean BERT model in sentence-transformers library

A benchmark for evaluation and comparison of various NLP tasks in Persian language.

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

NLP, before and after spaCy

Code examples for my Write Better Python Code series on YouTube.