OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Deep Learningoslo
Overview


O S L O

Open Source framework for Large-scale transformer Optimization

GitHub release Apache 2.0 Docs Issues



What's New:

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

  • If the C++ is available:
CPP_AVAILABLE=1 pip install oslo-core
  • If the C++ is not available:
CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

  • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
  • Kernel Fusion: A GPU optimization method to increase training and inference speed.
  • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
  • Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments
  • [WIP] Implement ZeRO Stage 3 (FSDP)

    [WIP] Implement ZeRO Stage 3 (FSDP)

    Title

    • Implement ZeRO Stage 3 (FullyShardedDataParallel)

    Description

    • [x] Add reduce_scatter_bucketer.py
      • [x] Add test_reduce_scatter_bucketer.py
    • [x] Add flatten_params_wrapper.py
      • [x] Add test_flatten_params_wrapper.py
    • [x] Add containers.py
      • [x] Add test_containers.py
    • [x] Add parallel.py
      • [x] Add test_parallel.py
    • [x] Add fsdp_optim_utils.py
    • [x] Update fsdp.py
    • [x] Add auto_wrap.py
      • [x] Add test_wrap.py
    opened by jinok2im 9
  • FusedAdam & CPUAdam

    FusedAdam & CPUAdam

    Title

    -FusedAdam & CPUAdam

    Description

    • Implement FusedAdam & CPUAdam

    Tasks

    • [x] Implement FusedAdam
    • [x] implement CPUAdam
    • [x] Test FusedAdam
    • [x] Test CPUAdam
    • [x] Test FusedSclaeMaskSoftmax (Name changed)
    opened by cozytk 6
  • [WIP] Add data processing modules referring to the lassl

    [WIP] Add data processing modules referring to the lassl

    Title

    • add data processing modules referring to the lassl

    Description

    • brought data processing functions that fit gpt2 with reference to lassl

    Linked Issues

    • None
    opened by gimmaru 6
  • Implementation of Sequential Parallelism

    Implementation of Sequential Parallelism

    SP with DP implementation

    • Implemented SP wrapper with DP

    Description

    • SequenceDataParallel works like native torch DDP with SP
    • you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
    opened by ohwi 5
  • Update data collators and Add models

    Update data collators and Add models

    Title

    • Update data collators and Add models

    Description

    • Updated data collators to utilize sequence parallel in Oslo trainer
    • Add models by referring to the transformers library
    opened by gimmaru 3
  • Implement Expert Parallel and Test for Initialization and Forward Pass

    Implement Expert Parallel and Test for Initialization and Forward Pass

    Title

    • Implement Expert Parallel and Test for Initialization and Forward Pass

    Description

    • Implement Wrapper, Modules and Features for Expert Parallel
    • Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace
    • Test initialization and forward pass for expert parallel
    opened by scsc0511 3
  • Integrate Sequence Parallelism branches

    Integrate Sequence Parallelism branches

    Title

    • Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

    Description

    • This PR is Integration of SP current version. But there is something wrong.
    • We will fix the bugs for the coming week and write test modules according to the SP design.
    • It did not include the contents of the branch that worked for the test.
    opened by l-yohai 3
  • implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    • implement tp-3d wrapper
    • rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.
    • revise tp-3d layers for huggingface compatibility
    • implement tp-3d test codes
    • refactor all tp test codes
    • unify format across all tensor parallel modules.
    opened by bzantium 2
  • Refactoring MultiheadAttention with todo anchors

    Refactoring MultiheadAttention with todo anchors

    Title

    • Refactoring MultiheadAttention with todo anchors

    Description

    • Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.
    • Remove unnecessary or unintended code and clean up annotations.
    • Unify return format and the variable name with native torch.

    Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

    cc. @hyunwoongko @ohwi

    opened by l-yohai 2
  • Add tp-1d layers testing

    Add tp-1d layers testing

    • Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d
    • modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
    opened by bzantium 2
  • [WIP] add test code of sp training

    [WIP] add test code of sp training

    Title

    • SP Model Test Code

    Description

    Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

    • WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
    opened by l-yohai 2
Releases(v2.0.2)
  • v2.0.2(Aug 25, 2022)

  • v2.0.1(Feb 20, 2022)

  • v2.0.0(Feb 14, 2022)

    Official release of OSLO 2.0.0 🎉🎉

    This version of OSLO provides the following features:

    • Tensor model parallelism
    • Efficient activation checkpointing
    • Kernel fusion

    We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.


    New feature: Kernel Fusion

    {
      "kernel_fusion": {
        "enable": "bool",
        "memory_efficient_fusion": "bool",
        "custom_cuda_kernels": "list"
      }
    }
    

    For more information, please check the kernel fusion tutorial

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a2(Feb 2, 2022)

  • v2.0.0a1(Feb 2, 2022)

    Add activation checkpointing

    You can use efficient activation checkpointing using OSLO with the following configuration.

    model = oslo.initialize(
        model,
        config={
            "model_parallelism": {
                "enable": True,
                "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
            },
            "activation_checkpointing": {
                "enable": True,
                "cpu_checkpointing": True,
                "partitioned_checkpointing": True,
                "contiguous_checkpointing": True,
            },
        },
    )
    

    Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a0(Jan 30, 2022)

    New API

    • We paid homage to DeepSpeed. Now it's easier and simpler to use.
    import oslo
    
    model = oslo.initialize(model, config="oslo-config.json")
    

    Add new models

    • Albert
    • Bert
    • Bart
    • T5
    • GPT2
    • GPTNeo
    • GPTJ
    • Electra
    • Roberta

    Add document

    • https://tunib-ai.github.io/oslo

    Remove old pipeline parallelism, kernel fusion code

    • We'll refurbish them using the latest methods
      • Kernel fusion: AOTAutograd
      • Pipeline parallelism: Sagemaker PP
    Source code(tar.gz)
    Source code(zip)
  • v.1.1.2(Jan 15, 2022)

    Updates

    [#7] Selective Kernel Fusion [#9] Fix argument bug

    New Feature: Selective Kernel Fusion

    Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

    from oslo import GPT2MLP, GPT2Attention
    
    # MLP only fusion
    model.fuse([GPT2MLP])
    
    # Attention only fusion
    model.fuse([GPT2Attention])
    
    # MLP + Attention fusion
    model.fuse([GPT2MLP, GPT2Attention])
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 29, 2021)

    [#3] Add deployment launcher of Parallelformers into OSLO.

    from oslo import GPTNeoForCausalLM
    
    model = GPTNeoForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-neo-2.7B",
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        deployment=True  # <-- new feature !
    )
    

    You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Dec 22, 2021)

  • v1.0(Dec 21, 2021)


    O S L O

    Open Source framework for Large-scale transformer Optimization

    GitHub release Apache 2.0 Docs Issues



    What's New:

    What is OSLO about?

    OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

    Installation

    OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

    pip install oslo-core
    

    Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

    • If the C++ is available:
    CPP_AVAILABLE=1 pip install oslo-core
    
    • If the C++ is not available:
    CPP_AVAILABLE=0 pip install oslo-core
    

    Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

    Key Features

    import deepspeed 
    from oslo import GPTJForCausalLM
    
    # 1. 3D Parallelism
    model = GPTJForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
    )
    
    # 2. Kernel Fusion
    model = model.fuse()
    
    # 3. DeepSpeed Support
    engines = deepspeed.initialize(
        model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
    )
    
    # 4. Data Processing
    from oslo import (
        DatasetPreprocessor, 
        DatasetBlender, 
        DatasetForCausalLM, 
        ...    
    )
    

    OSLO offers the following features.

    • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
    • Kernel Fusion: A GPU optimization method to increase training and inference speed.
    • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
    • Data Processing: Various utilities for efficient large-scale data processing.

    See USAGE.md to learn how to use them.

    Administrative Notes

    Citing OSLO

    If you find our work useful, please consider citing:

    @misc{oslo,
      author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
      title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
      howpublished = {\url{https://github.com/tunib-ai/oslo}},
      year         = {2021},
    }
    

    Licensing

    The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

    Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

    Acknowledgements

    The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

    Source code(tar.gz)
    Source code(zip)
Owner
TUNiB
TUNiB Inc.
TUNiB
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
Code for Transformer Hawkes Process, ICML 2020.

Transformer Hawkes Process Source code for Transformer Hawkes Process (ICML 2020). Run the code Dependencies Python 3.7. Anaconda contains all the req

Simiao Zuo 111 Dec 26, 2022
Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

This is the code associated with the paper Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks, published at CVPR 2020.

Thomas Roddick 219 Dec 20, 2022
Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

CAMS: Color-Aware Multi-Style Transfer Mahmoud Afifi1, Abdullah Abuolaim*1, Mostafa Hussien*2, Marcus A. Brubaker1, Michael S. Brown1 1York University

Mahmoud Afifi 36 Dec 04, 2022
A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

##A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation. #USAGE To run the trained classifier on some images: python w

Alex Seewald 13 Nov 17, 2022
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

Yuanyuan Yuan 172 Dec 29, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

Phan Hoang 39 Dec 16, 2022
Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evalua

Autonomous Agents Research Group (University of Edinburgh) 2 Oct 09, 2022
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020) Official implementation of: Forest R-CNN: Large-Vo

Jialian Wu 54 Jan 06, 2023
Neural Message Passing for Computer Vision

Neural Message Passing for Quantum Chemistry Implementation of different models of Neural Networks on graphs as explained in the article proposed by G

Pau Riba 310 Nov 07, 2022
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

Kate Rakelly 516 Jan 05, 2023
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 97 Jan 03, 2023
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss) This is the official pytorch implementation of our paper: "S2-BNN: Bridging th

Zhiqiang Shen 52 Dec 24, 2022
Repository for MDPGT

MD-PGT Repository for implementing and reproducing the results for the paper MDPGT: Momentum-based Decentralized Policy Gradient Tracking. Available E

Xian Yeow Lee 2 Dec 30, 2021
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
MoveNet Single Pose on OpenVINO

MoveNet Single Pose tracking on OpenVINO Running Google MoveNet Single Pose models on OpenVINO. A convolutional neural network model that runs on RGB

35 Nov 11, 2022
An auto discord account and token generator. Automatically verifies the phone number. Works without proxy. Bypasses captcha.

JOIN DISCORD SERVER https://discord.gg/uAc3agBY FREE HCAPTCHA SOLVING API Discord-Token-Gen An auto discord token generator. Auto verifies phone numbe

3kp 271 Jan 01, 2023
Reproducing Results from A Hybrid Approach to Targeting Social Assistance

title author date output Reproducing Results from A Hybrid Approach to Targeting Social Assistance Lendie Follett and Heath Henderson 12/28/2021 html_

Lendie Follett 0 Jan 06, 2022
A knowledge base construction engine for richly formatted data

Fonduer is a Python package and framework for building knowledge base construction (KBC) applications from richly formatted data. Note that Fonduer is

HazyResearch 386 Dec 05, 2022