Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Overview

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

When I opened this repository in 2017, there was no official code yet. I tried to implement the paper as I understood, but to no surprise it had several bugs. I realized them mostly thanks to people who issued here, so I'm very grateful to all of them. Though there is the official implementation as well as several other unofficial github repos, I decided to update my own one. This update focuses on:

  • readable / understandable code writing
  • modularization (but not too much)
  • revising known bugs. (masking, positional encoding, ...)
  • updating to TF1.12. (tf.data, ...)
  • adding some missing components (bpe, shared weight matrix, ...)
  • including useful comments in the code.

I still stick to IWSLT 2016 de-en. I guess if you'd like to test on a big data such as WMT, you would rely on the official implementation. After all, it's pleasant to check quickly if your model works. The initial code for TF1.2 is moved to the tf1.2_lecacy folder for the record.

Requirements

  • python==3.x (Let's move on to python 3 if you still use python 2)
  • tensorflow==1.12.0
  • numpy>=1.15.4
  • sentencepiece==0.1.8
  • tqdm>=4.28.1

Training

bash download.sh

It should be extracted to iwslt2016/de-en folder automatically.

  • STEP 2. Run the command below to create preprocessed train/eval/test data.
python prepro.py

If you want to change the vocabulary size (default:32000), do this.

python prepro.py --vocab_size 8000

It should create two folders iwslt2016/prepro and iwslt2016/segmented.

  • STEP 3. Run the following command.
python train.py

Check hparams.py to see which parameters are possible. For example,

python train.py --logdir myLog --batch_size 256 --dropout_rate 0.5
  • STEP 3. Or download the pretrained models.
wget https://dl.dropbox.com/s/4lom1czy5xfzr4q/log.zip; unzip log.zip; rm log.zip

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

  • Run
python test.py --ckpt log/1/iwslt2016_E19L2.64-29146 (OR yourCkptFile OR yourCkptFileDirectory)

Results

  • Typically, machine translation is evaluated with Bleu score.
  • All evaluation results are available in eval/1 and test/1.
tst2013 (dev) tst2014 (test)
28.06 23.88

Notes

  • Beam decoding will be added soon.
  • I'm going to update the code when TF2.0 comes out if possible.
Owner
Kyubyong Park
Lives in Seoul, Korea. Studied Linguistics at SNU and Univ. of Hawaii.
Kyubyong Park
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
A paper list for aspect based sentiment analysis.

Aspect-Based-Sentiment-Analysis A paper list for aspect based sentiment analysis. Survey [IEEE-TAC-20]: Issues and Challenges of Aspect-based Sentimen

jiangqn 419 Dec 20, 2022
DiY Oxygen Concentrator based on the OxiKit

M19O2 DiY Oxygen Concentrator based on / inspired by the OxiKit, OpenOx, Marut, RepRap and Project Apollo platforms. About Read about the project on H

Maker's Asylum 62 Dec 22, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 03, 2023
LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

Immanuvel Prathap S 1 Jan 16, 2022
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

Forges Alterway 8 Jan 06, 2023
Code for the paper "Are Sixteen Heads Really Better than One?"

Are Sixteen Heads Really Better than One? This repository contains code to reproduce the experiments in our paper Are Sixteen Heads Really Better than

Paul Michel 143 Dec 14, 2022
Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

Md Shamim Hussain 21 Dec 14, 2022
A repo for materials relating to the tutorial of CS-332 NLP

CS-332-NLP A repo for materials relating to the tutorial of CS-332 NLP Contents Tutorial 1: Introduction Corpus Regular expression Tokenization Tutori

Alok singh 9 Feb 15, 2022
NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

PeterPham 1 Jan 13, 2022
A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

AI2 1.3k Jan 03, 2023
PyTorch impelementations of BERT-based Spelling Error Correction Models.

PyTorch impelementations of BERT-based Spelling Error Correction Models

Heng Cai 209 Dec 30, 2022
BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

patterns-finder Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Ex

22 Dec 19, 2022