The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Overview

tldr-transformers

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Models: GPT- *, * BERT *, Adapter- *, * T5, etc.

BERT and T5 (art from the original papers)

     

Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface 🤗 implementation.

Here is an example: t5.

The transformers papers are presented somewhat chronologically below. Go to the " 👉 Notes 👈 " column below to find the notes for each paper.

This repo also includes a table quantifying the differences across transformer papers all in one table.

Contents

Quick_Note

This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.

Motivation

With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.

Models

Model Year Institute Paper 👉 Notes 👈 Original Code Huggingface 🤗 Other Repo
Transformer 2017 Google Attention is All You Need Skipped, too many good write-ups: ?
GPT-3 2018 OpenAI Language Models are Unsupervised Multitask Learners To-Do X X
GPT-J-6B 2021 EleutherAI GPT-J-6B: 6B Jax-Based Transformer (public GPT-3) X here x x
BERT 2018 Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT notes here here
DistilBERT 2019 Huggingface DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter DistilBERT notes here
ALBERT 2019 Google/Toyota ALBERT: A Lite BERT for Self-supervised Learning of Language Representations ALBERT notes here here
RoBERTa 2019 Facebook RoBERTa: A Robustly Optimized BERT Pretraining Approach RoBERTa notes here here
BART 2019 Facebook BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension BART notes here here
T5 2019 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer T5 notes here here
Adapter-BERT 2019 Google Parameter-Efficient Transfer Learning for NLP Adapter-BERT notes here - here
Megatron-LM 2019 NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron notes here - here
Reformer 2020 Google Reformer: The Efficient Transformer Reformer notes here
byT5 2021 Google ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 notes here here
CLIP 2021 OpenAI Learning Transferable Visual Models From Natural Language Supervision CLIP notes here here
DALL-E 2021 OpenAI Zero-Shot Text-to-Image Generation DALL-E notes here -
Codex 2021 OpenAI Evaluating Large Language Models Trained on Code Codex notes X -

BigTable

All of the table summaries found ^ collapsed into one really big table here.

Alignment

Paper Year Institute 👉 Notes 👈 Codes
Fine-Tuning Language Models from Human Preferences 2019 OpenAI To-Do None

Scaling

Paper Year Institute 👉 Notes 👈 Codes
Scaling Laws for Neural Language Models 2020 OpenAI To-Do None

Memorization

Paper Year Institute 👉 Notes 👈 Codes
Extracting Training Data from Large Language Models 2021 Google et al. To-Do None
Deduplicating Training Data Makes Language Models Better 2021 Google et al. To-Do None

FewLabels

Paper Year Institute 👉 Notes 👈 Codes
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP 2021 GIT/UNC To-Do None
Learning with fewer labeled examples 2021 Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) Worth a read, won't summarize here. None

Contribute

If you are interested in contributing to this repo, feel free to do the following:

  1. Fork the repo.
  2. Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
  3. Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
  4. Submit your PR.

Errata

Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.

Citation

@misc{cliff-notes-transformers,
  author = {Thompson, Will},
  url = {https://github.com/will-thompson-k/cliff-notes-transformers},
  year = {2021}
}

For the notes above, I've linked the original papers.

License

MIT

Owner
Will Thompson
Will Thompson
A general python framework for visual object tracking and video object segmentation, based on PyTorch

PyTracking A general python framework for visual object tracking and video object segmentation, based on PyTorch. 📣 Two tracking/VOS papers accepted

2.6k Jan 04, 2023
This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

Udbhav Bamba 41 Dec 14, 2022
This repo contains the source code and a benchmark for predicting user's utilities with Machine Learning techniques for Computational Persuasion

Machine Learning for Argument-Based Computational Persuasion This repo contains the source code and a benchmark for predicting user's utilities with M

Ivan Donadello 4 Nov 07, 2022
SARS-Cov-2 Recombinant Finder for fasta sequences

Sc2rf - SARS-Cov-2 Recombinant Finder Pronounced: Scarf What's this? Sc2rf can search genome sequences of SARS-CoV-2 for potential recombinants - new

Lena Schimmel 41 Oct 03, 2022
Fully Convolutional DenseNets for semantic segmentation.

Introduction This repo contains the code to train and evaluate FC-DenseNets as described in The One Hundred Layers Tiramisu: Fully Convolutional Dense

485 Nov 26, 2022
Real-time multi-object tracker using YOLO v5 and deep sort

This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algor

Mike 3.6k Jan 05, 2023
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

129 Jan 04, 2023
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Noam Eshed 34 Oct 02, 2022
This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.3k Dec 30, 2022
Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara 898 Jan 07, 2023
Open Source Light Field Toolbox for Super-Resolution

BasicLFSR BasicLFSR is an open-source and easy-to-use Light Field (LF) image Super-Ressolution (SR) toolbox based on PyTorch, including a collection o

Squidward 50 Nov 18, 2022
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022
Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

1 Oct 28, 2022
Meandering In Networks of Entities to Reach Verisimilar Answers

MINERVA Meandering In Networks of Entities to Reach Verisimilar Answers Code and models for the paper Go for a Walk and Arrive at the Answer - Reasoni

Shehzaad Dhuliawala 271 Dec 13, 2022
A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

Karttikeya Manglam 40 Nov 18, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022