Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Text Data & NLPgpt
Overview

Pytorch GPT-X

My Own Pytorch GPT-X

1. Abstract

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

2. Model

Transformer

Additional Module

① Rezero

Rezero Is All You Need link

② Explicit Sparse Transformer

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection link

③ Macaron Architecture

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View link

④ RealFormer, Residual Attention

RealFormer link

Train

DeepSpeed

TODO

  • ReZero
  • RealFormer, Residual Attention
  • Macaron architectures
  • Macaron architectures - layer Scale 0.5
  • Explicit Sparse Transformer
  • torch lightning
  • Deepspeed train on single GPU
  • Deepspeed parallel trainig on 2 V100 GPU with 16GB Memory

Parameter For Few-shot

The 175B parameter model is very large, but a large model is needed for Few-Shot Learning. So this repository try to use DeepSpeed for training extremely big model.

GPT-3 Config

model_name n_params n_layer d_model n_heads d_head batch_size learning_rate
GPT-3 175B 175B 96 12288 96 128 3.2M 0.6 x 10^-4
GPT-3 13B 13B 40 5140 40 128 2M 1.0 x 10^-4
GPT-3 6.7B 6.7B 32 4096 32 128 2M 1.2 x 10^-4
GPT-3 2.7B 2.7B 32 25560 32 80 1M 1.6 x 10^-4

References

Transformer

DeepSpeed

ReZero

Explicit Sparse Transformer

Macaron Architecrue

Owner
Seonghwan Kim
Seonghwan Kim
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Search-Engine - 📖 AI based search engine

Search Engine AI based search engine that was trained on 25000 samples, feel free to train on up to 1.2M sample from kaggle dataset, link below StackS

Vladislav Kruglikov 2 Nov 29, 2022
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks Recently, researchers proposed pruning deep neural n

itay hubara 4 Feb 23, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

Facebook Research 3.2k Jan 04, 2023
The training code for the 4th place model at MDX 2021 leaderboard A.

The training code for the 4th place model at MDX 2021 leaderboard A.

Chin-Yun Yu 32 Dec 18, 2022
DLO8012: Natural Language Processing & CSL804: Computational Lab - II

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

AMEY THAKUR 7 Apr 28, 2022
Crowd sourced training data for Rasa NLU models

NLU Training Data Crowd-sourced training data for the development and testing of Rasa NLU models. If you're interested in grabbing some data feel free

Rasa 169 Dec 26, 2022
Fast topic modeling platform

The state-of-the-art platform for topic modeling. Full Documentation User Mailing List Download Releases User survey What is BigARTM? BigARTM is a pow

BigARTM 633 Dec 21, 2022
Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

wav2vec_finetune Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks Initial test: gender recognition on this dat

8 Aug 11, 2022
Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Lau 1 Dec 17, 2021
Codes for coreference-aware machine reading comprehension

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022. Dataset There are three folders for our thr

11 Sep 29, 2022
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Text Classification in Turkish Texts with Bert

You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification

42 Dec 31, 2022
PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

PyTranslator O Que é e para que serve o PyTranslator? PyTranslator é simultaneamente um editor e tradutor de texto em com interface gráfica que usa a

Elizeu Barbosa Abreu 1 May 12, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022