PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Last update: Feb 27, 2022

Overview

Transformer-PyTorch

A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Pre-LN applies LayerNorm to the input of every sublayers instead of the residual connection part in Post-LN. The proposed model architecture in the paper was in Post-LN, however the official implementation has been changed into Pre-LN version. The experiment result shows that Pre-LN transformer converges faster while doesn't even need warming up, and is less sensitive to hyperparameters. For more detail about the difference between them, check out the paper On Layer Normalization in the Transformer Architecture.

A STAR would be so nice if you like it!

Dataset

The English-German small-dataset WMT 2016 multimodal task from torchtext.

Prerequisites

Python3
PyTorch >= 1.2.0
torchtext
spacy
nltk
tqdm

Implementation Notes

Beam search is not supported.
Label smoothing is not implemented.
BPE is not adapted.

Usage

Run transformer.ipynb to download dataset and train the model.
Change the flag pre_lnorm to determine which to use.

Evaluation

Parameter settings
- hidden size: 512
- feed forward size: 2048
- num head: 8
- layer: 6
- warm-up: 2000
- batch size: 128

Generated Examples

Here's an example from test data:

source
- eine frau verwendet eine bohrmaschine während ein mann sie fotografiert .
gold
- a woman uses a drill while another man takes her picture .
inference
- a woman uses an electric drill as a man takes a picture .

TODO

Label smoothing
Attention visualization

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Related tags

Overview

Transformer-PyTorch

A STAR would be so nice if you like it!

Dataset

Prerequisites

Implementation Notes

Usage

Evaluation

Generated Examples

TODO

References

Owner

Jared Wang

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

PIXIE: Collaborative Regression of Expressive Bodies

High-fidelity 3D Model Compression based on Key Spheres

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

A learning-based data collection tool for human segmentation

Improving XGBoost survival analysis with embeddings and debiased estimators

This folder contains the python code of UR5E's advanced forward kinematics model.

PyTorch implementation of probabilistic deep forecast applied to air quality.

Robot Reinforcement Learning on the Constraint Manifold

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

Distributed Arcface Training in Pytorch

A fast implementation of bss_eval metrics for blind source separation

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

A Python-based development platform for automated trading systems - from backtesting to optimisation to livetrading.

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).