OSLO: Open Source framework for Large-scale transformer Optimization

Last update: Nov 24, 2022

Related tags

Deep Learning oslo

Overview

O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
Kernel Fusion: A GPU optimization method to increase training and inference speed.
DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments

[WIP] Implement ZeRO Stage 3 (FSDP)
Title

Implement ZeRO Stage 3 (FullyShardedDataParallel)

Description

[x] Add reduce_scatter_bucketer.py

[x] Add test_reduce_scatter_bucketer.py

[x] Add flatten_params_wrapper.py

[x] Add test_flatten_params_wrapper.py

[x] Add containers.py

[x] Add test_containers.py

[x] Add parallel.py

[x] Add test_parallel.py

[x] Add fsdp_optim_utils.py

[x] Update fsdp.py

[x] Add auto_wrap.py

[x] Add test_wrap.py
opened by jinok2im 9
FusedAdam & CPUAdam
Title

-FusedAdam & CPUAdam

Description

Implement FusedAdam & CPUAdam

Tasks

[x] Implement FusedAdam

[x] implement CPUAdam

[x] Test FusedAdam

[x] Test CPUAdam

[x] Test FusedSclaeMaskSoftmax (Name changed)
opened by cozytk 6
[WIP] Add data processing modules referring to the lassl
Title

add data processing modules referring to the lassl

Description

brought data processing functions that fit gpt2 with reference to lassl

Linked Issues

None
opened by gimmaru 6
Implementation of Sequential Parallelism
SP with DP implementation

Implemented SP wrapper with DP

Description

SequenceDataParallel works like native torch DDP with SP

you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
opened by ohwi 5
Update data collators and Add models
Title

Update data collators and Add models

Description

Updated data collators to utilize sequence parallel in Oslo trainer

Add models by referring to the transformers library
opened by gimmaru 3
Implement Expert Parallel and Test for Initialization and Forward Pass
Title

Implement Expert Parallel and Test for Initialization and Forward Pass

Description

Implement Wrapper, Modules and Features for Expert Parallel

Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace

Test initialization and forward pass for expert parallel
opened by scsc0511 3
Integrate Sequence Parallelism branches
Title

Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

Description

This PR is Integration of SP current version. But there is something wrong.

We will fix the bugs for the coming week and write test modules according to the SP design.

It did not include the contents of the branch that worked for the test.
opened by l-yohai 3
implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers
implement tp-3d wrapper

rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.

revise tp-3d layers for huggingface compatibility

implement tp-3d test codes

refactor all tp test codes

unify format across all tensor parallel modules.
opened by bzantium 2
Refactoring MultiheadAttention with todo anchors
Title

Refactoring MultiheadAttention with todo anchors

Description

Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.

Remove unnecessary or unintended code and clean up annotations.

Unify return format and the variable name with native torch.

Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

cc. @hyunwoongko @ohwi
opened by l-yohai 2
Add tp-1d layers testing
Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d

modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
opened by bzantium 2
[WIP] add test code of sp training
Title

SP Model Test Code

Description

Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
opened by l-yohai 2

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)
Revert oslo to 1.1.2.

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 20, 2022)
Merge changes from functorch upstream.

Fix documents and tutorials

Source code(tar.gz)
Source code(zip)
v2.0.0(Feb 14, 2022)
Official release of OSLO 2.0.0 🎉🎉

This version of OSLO provides the following features:

Tensor model parallelism

Efficient activation checkpointing

Kernel fusion

We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.

New feature: Kernel Fusion

{ "kernel_fusion": { "enable": "bool", "memory_efficient_fusion": "bool", "custom_cuda_kernels": "list" } }

For more information, please check the kernel fusion tutorial
Source code(tar.gz)
Source code(zip)
v2.0.0a2(Feb 2, 2022)

Quick fix of cuda rng state tracker
Source code(tar.gz)
Source code(zip)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

You can use efficient activation checkpointing using OSLO with the following configuration.

model = oslo.initialize(
    model,
    config={
        "model_parallelism": {
            "enable": True,
            "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
        },
        "activation_checkpointing": {
            "enable": True,
            "cpu_checkpointing": True,
            "partitioned_checkpointing": True,
            "contiguous_checkpointing": True,
        },
    },
)

Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

Source code(tar.gz)
Source code(zip)

v2.0.0a0(Jan 30, 2022)
New API

We paid homage to DeepSpeed. Now it's easier and simpler to use.

import oslo model = oslo.initialize(model, config="oslo-config.json")

Add new models

Albert

Bert

Bart

T5

GPT2

GPTNeo

GPTJ

Electra

Roberta

Add document

https://tunib-ai.github.io/oslo

Remove old pipeline parallelism, kernel fusion code

We'll refurbish them using the latest methods

Kernel fusion: AOTAutograd

Pipeline parallelism: Sagemaker PP

Source code(tar.gz)
Source code(zip)
v.1.1.2(Jan 15, 2022)
Updates

[#7] Selective Kernel Fusion [#9] Fix argument bug

New Feature: Selective Kernel Fusion

Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

from oslo import GPT2MLP, GPT2Attention # MLP only fusion model.fuse([GPT2MLP]) # Attention only fusion model.fuse([GPT2Attention]) # MLP + Attention fusion model.fuse([GPT2MLP, GPT2Attention])
Source code(tar.gz)
Source code(zip)

v1.1(Dec 29, 2021)

[#3] Add deployment launcher of Parallelformers into OSLO.

from oslo import GPTNeoForCausalLM

model = GPTNeoForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-neo-2.7B",
    tensor_parallel_size=2,
    pipeline_parallel_size=2,
    deployment=True  # <-- new feature !
)

You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

Source code(tar.gz)
Source code(zip)

v1.0.1(Dec 22, 2021)
Quick Fix

Support Megatron-LM style (.jsonl) file preprecessing.

Source code(tar.gz)
Source code(zip)
v1.0(Dec 21, 2021)
O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed from oslo import GPTJForCausalLM # 1. 3D Parallelism model = GPTJForCausalLM.from_pretrained_with_parallel( "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2, ) # 2. Kernel Fusion model = model.fuse() # 3. DeepSpeed Support engines = deepspeed.initialize( model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ..., ) # 4. Data Processing from oslo import ( DatasetPreprocessor, DatasetBlender, DatasetForCausalLM, ... )

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.

Kernel Fusion: A GPU optimization method to increase training and inference speed.

DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.

Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo, author = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong}, title = {OSLO: Open Source framework for Large-scale transformer Optimization}, howpublished = {\url{https://github.com/tunib-ai/oslo}}, year = {2021}, }

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).
Source code(tar.gz)
Source code(zip)

Owner

TUNiB

TUNiB Inc.

GitHub Repository

Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

62 Oct 25, 2022

Non-stationary GP package written from scratch in PyTorch

NSGP-Torch Examples gpytorch model with skgpytorch # Import packages import torch from regdata import NonStat2D from gpytorch.kernels import RBFKernel

1 Mar 06, 2022

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

7k Jan 02, 2023

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

ETSformer - Pytorch Implementation of ETSformer, state of the art time-series Transformer, in Pytorch Install $ pip install etsformer-pytorch Usage im

121 Dec 30, 2022

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

1 Dec 02, 2021

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection Introduction This repository includes codes and models of "Effect of De

5 Sep 05, 2022

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 04, 2020

OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Overview

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Comments

Title

Description

Title

Description

Tasks

Title

Description

Linked Issues

SP with DP implementation

Description

Title

Description

Title

Description

Title

Description

Title

Description

Title

Description

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)

v2.0.1(Feb 20, 2022)

v2.0.0(Feb 14, 2022)

Official release of OSLO 2.0.0 🎉🎉

New feature: Kernel Fusion

v2.0.0a2(Feb 2, 2022)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

v2.0.0a0(Jan 30, 2022)

New API

Add new models

Add document

Remove old pipeline parallelism, kernel fusion code

v.1.1.2(Jan 15, 2022)

Updates

New Feature: Selective Kernel Fusion

v1.1(Dec 29, 2021)

v1.0.1(Dec 22, 2021)

v1.0(Dec 21, 2021)

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Owner

TUNiB

Code for "Searching for Efficient Multi-Stage Vision Transformers"

Non-stationary GP package written from scratch in PyTorch

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Efficient training of deep recommenders on cloud.

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

Group Activity Recognition with Clustered Spatial Temporal Transformer

Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

The source code for Adaptive Kernel Graph Neural Network at AAAI2022

Compare neural networks by their feature similarity

Mapping Conditional Distributions for Domain Adaptation Under Generalized Target Shift

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition