A quick recipe to learn all about Transformers

Last update: Dec 31, 2022

Related tags

Overview

Transformers Recipe

Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks. While it has mostly been used for NLP tasks, it is now seeing heavy adoption to address computer vision tasks as well. That makes it a very important concept to understand and be able to apply.

I am aware that a lot of machine learning and NLP students and practitioners are keen on learning about transformers. Therefore, I have prepared this recipe of resources and study materials to help guide students interested in learning about the world of Transformers.

To begin with, I have prepared a few links to materials that I used to better understand and implement transformer models from scratch.

This recipe will also allow me to easily continue to update the study materials needed to learning about Transformers.

🧠 High-level Introduction

First, try to get a very high-level introduction about transformers. Some references worth looking at:

🔗 Transformers From Scratch (Brandon Rohrer)

🔗 How Transformers work in deep learning and NLP: an intuitive introduction (AI Summer)

🔗 Deep Learning for Language Understanding (DeepMind)

🎨 The Illustrated Transformer

Jay Alammar's illustrated explanations are exceptional. Once you get that high-level understanding of transformers, you can jump into this popular detailed and illustrated explanation of transformers:

🔗 http://jalammar.github.io/illustrated-transformer/

Figure source: http://jalammar.github.io/illustrated-transformer/

🔖 Technical Summary

At this point, you may be looking for a technical summary and overview of transformers. Lilian Weng's blog posts are a gem and provide concise technical explanations/summaries:

🔗 https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html

Figure source: https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html

👩🏼‍💻 Implementation

After the theory, it's important to test the knowledge. I typically prefer to understand things in more detail so I prefer to implement algorithms from scratch. For implementing transformers, I mainly relied on this tutorial:

🔗 https://nlp.seas.harvard.edu/2018/04/03/attention.html

(Google Colab | GitHub)

Figure source: https://nlp.seas.harvard.edu/2018/04/03/attention.html

📄 Attention Is All You Need

This paper by Vaswani et al. introduced the Transformer architecture. Read it after you have a high-level understanding and want to get into the details. Pay attention to other references in the paper for diving deep.

🔗 https://arxiv.org/pdf/1706.03762v5.pdf

Figure source: https://arxiv.org/pdf/1706.03762v5.pdf

👩🏼‍💻 Applying Transformers

After some time studying and understanding the theory behind transformers, you may be interested in applying them to different NLP projects or research. At this time, your best bet is the Transformers library by HuggingFace.

🔗 https://github.com/huggingface/transformers

The Hugging Face Team is also publishing a new book on NLP with Transformers, so you might want to check that out here.

Feel free to suggest study material. In the next update, I am looking to add a more comprehensive collection of Transformer applications and papers. In addition, a code implementation for easy experimentation is coming as well. Stay tuned!

To get regular updates on new ML and NLP resources, follow me on Twitter.

A quick recipe to learn all about Transformers

Related tags

Overview

Transformers Recipe

🧠 High-level Introduction

🎨 The Illustrated Transformer

🔖 Technical Summary

👩🏼‍💻 Implementation

📄 Attention Is All You Need

👩🏼‍💻 Applying Transformers

Owner

DAIR.AI

Official implementation of Long-Short Transformer in PyTorch.

MERLOT: Multimodal Neural Script Knowledge Models

Computational inteligence project on faces in the wild dataset

Multi-Stage Progressive Image Restoration

Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features"

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

CARL provides highly configurable contextual extensions to several well-known RL environments.

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

Draw like Bob Ross using the power of Neural Networks (With PyTorch)!

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

This is just a funny project that we want to see AutoEncoder (AE) can actually work to enhance the features we want

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

nfelo: a power ranking, prediction, and betting model for the NFL

Posterior predictive distributions quantify uncertainties ignored by point estimates.

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

A pytorch implementation of Pytorch-Sketch-RNN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

A quick recipe to learn all about Transformers

Related tags

Overview

Transformers Recipe

🧠 High-level Introduction

🎨 The Illustrated Transformer

🔖 Technical Summary

👩🏼‍💻 Implementation

📄 Attention Is All You Need

👩🏼‍💻 Applying Transformers

Owner

DAIR.AI

Official implementation of Long-Short Transformer in PyTorch.

MERLOT: Multimodal Neural Script Knowledge Models

Computational inteligence project on faces in the wild dataset

Multi-Stage Progressive Image Restoration

Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features"

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

CARL provides highly configurable contextual extensions to several well-known RL environments.

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

Draw like Bob Ross using the power of Neural Networks (With PyTorch)!

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

This is just a funny project that we want to see AutoEncoder (AE) can actually work to enhance the features we want

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

nfelo: a power ranking, prediction, and betting model for the NFL

Posterior predictive distributions quantify uncertainties ignored by point estimates.

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

A pytorch implementation of Pytorch-Sketch-RNN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.