GPT, but made only out of gMLPs

Last update: Dec 01, 2022

Overview

GPT - gMLP

This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will contain a variant that does gMLP for local sliding windows. The hope is to be able to stretch a single GPU to be able to train context lengths of 4096 and above efficiently and well.

GPT is technically a misnomer now, since there will be no attention (transformer) at all contained in the architecture.

Install

$ pip install g-mlp-gpt

Usage

import torch
from g_mlp_gpt import gMLPGPT

model = gMLPGPT(
    num_tokens = 20000,
    dim = 512,
    depth = 4,
    seq_len = 1024,
    window_size = (128, 256, 512, 1024) # window sizes for each depth
)

x = torch.randint(0, 20000, (1, 1000))
logits = model(x) # (1, 1000, 20000)

16k context length

import torch
from g_mlp_gpt import gMLPGPT

model = gMLPGPT(
    num_tokens = 20000,
    dim = 512,
    seq_len = 16384,
    depth = 8,
    reversible = True,
    window = (128, 128, 256, 512, 1024, 1024, 2048, 2048, 4096, 4096, 8192, 8192),
    axial = (1, 1, 1, 1, 1, 1, 2, 2, 4, 4, 8, 8)
).cuda()

x = torch.randint(0, 20000, (1, 16384)).cuda()
logits = model(x) # (1, 16384, 20000)

Citations

@misc{liu2021pay,
    title   = {Pay Attention to MLPs}, 
    author  = {Hanxiao Liu and Zihang Dai and David R. So and Quoc V. Le},
    year    = {2021},
    eprint  = {2105.08050},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

AI-Bot 一个基于watermelon改造的OpenAI-GPT-2的智能机器人在Binder上直接运行测试目前有两种实现方式 TF2的GPT-2 TF

9 Nov 16, 2022

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

24 Oct 26, 2022

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

331 Jan 3, 2023

Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

74 Dec 5, 2022

Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

28 Feb 20, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

39 Nov 30, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

GPT, but made only out of gMLPs

Related tags

Overview

GPT - gMLP

Install

Usage

Citations

You might also like...

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

Simple, but essential Bayesian optimization package

Like a cowsay but without cows!

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Like Dirt-Samples, but cleaned up

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

Releases(0.0.15)

0.0.15(May 25, 2021)

0.0.14(May 25, 2021)

0.0.12(May 24, 2021)

0.0.11(May 23, 2021)

0.0.10(May 23, 2021)

0.0.9(May 21, 2021)

0.0.8(May 21, 2021)

0.0.7(May 21, 2021)

0.0.6(May 21, 2021)

0.0.5(May 20, 2021)

0.0.4(May 20, 2021)

0.0.3(May 20, 2021)

0.0.2(May 20, 2021)

0.0.1(May 20, 2021)

Owner

Phil Wang

Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

It's A ML based Web Site build with python and Django to find the breed of the dog

The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

L-Verse: Bidirectional Generation Between Image and Text

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

Rethinking Transformer-based Set Prediction for Object Detection

classify fashion-mnist dataset with pytorch

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Fast, differentiable sorting and ranking in PyTorch

A toolkit for Lagrangian-based constrained optimization in Pytorch

Repository of Vision Transformer with Deformable Attention

A hyperparameter optimization framework

An Open-Source Package for Information Retrieval.

Single/multi view image(s) to voxel reconstruction using a recurrent neural network

v objective diffusion inference code for PyTorch.

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Implementation of Kronecker Attention in Pytorch

Educational 2D SLAM implementation based on ICP and Pose Graph