Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Related tags

Deep Learninglorien
Overview

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Build Status codecov.io

Lorien is an infrastructure to massively explore/benchmark the best schedules of given deep learning models. Lorien is deep learning compiler (DLC) agnostic, so one can easily implement a Lorien dialect to support a new DLC.

Motivation

Although auto-tuning frameworks for deep learning compilers (e.g., TVM, Halide) are capable of delivering high-performance operators that match or even beat vendor kernel libraries, auto-tuning a deep learning model could take days or even weeks, especially for the model with many workloads like ResNet-152 or Inception V3.

With such a long tuning time, one key question To maintain the best user experience during deep model developments and deployments is How to promptly deliver schedules with reasonably good performance upon user requests? Accordingly, we design and implement Lorien to remove the following obstacles:

  1. Tuning Process Scalability and Stability. Long tuning time affects not only the time-to-market but the stability. To the best of our knowledge, none of existing auto-tuning frameworks is designed for tuning on multiple machines, and none of them consider fault tolerance. The tuning process, hence, has to be manually started over if it was accidentally interrupted. This is crucial especially on edge devices, which are less reliable than cloud instances and may fail frequently due to overheat or other factors.

  2. Tuning Result Management. Although almost all auto-tuning frameworks provide mechanisms to serialize tuning results for future applications, all of them use file-based mechanism and have different formats. As a result, engineers have additional work to orchestrate the data for efficient usage.

  3. Time to Deliver an Efficient Schedule. Even a database is constructed to serve most user requests, it is still possible that certain workloads are missing. However, modern auto-tuning frameworks usually leverage iterative search algorithms with on-device measurements, which usually take hours, to find an efficient schedule for an unseen workload. The unfavorably expensive querying/tuning overhead makes production deployment impractical.

Lorien is a unified and extensible infrastructure for delivering efficient deep learning workloads upon requests. Lorien allows auto-tuning deep learning frameworks to be easily plugged in as dialects, and supports large scale tuning on both cloud and edge platforms. The tuning results are managed in a NoSQL database with a unified data model that fits all auto-tuning frameworks. While the best schedules managed in the database can be used to compile deep learning models to achieve high performance, the tuning logs managed in a file system can also 1) enable more comprehensive performance analysis on different platforms, and 2) help train a performance cost model with an AutoML solution.

Please visit the official documentations for setup guideline and tutorials.

System Requirements

  • Python 3.6+

  • Amazon DynamoDB (local or aws): DynamoDB is used for storing and maintain the tuned schedules. You can choose to either of the following:

    1. Launch a local version using JVM on your machine, and specify endpoint URL (e.g. --db "endpoint_url: http://:8000") when invoking a tuning procses.

    2. Configure AWS credential on your machine to directly use AWS DynamoDB service. In this case, you do not have to specify any argument in tuning configurations.

  • AWS S3 (optional): S3 is used to store the full tuning logs (JSON files generated by AutoTVM). If you specify --commit-log-to bucket_name and configure an AWS credential on your machine, then all complete tuning logs will be uploaded to the S3 bucket for debugging or research prupose. Note that this is an optional requirement, so you can ignore the --commit-log-to argument if you do not want to keep full tuning logs.

  • AWS Batch (AWS ECR): You have to set up AWS batch computation environments, job queues, and job definitions in advance to use Lorien AWS batch worker for tuning. See this blog post for reference. You may also need to build an upload Lorien docker images to AWS ECR as the AWS batch job running container.

Docker Images

You can directly make use of pre-built Lorien docker images on Docker Hub, which includes two typs of images for CPU and CPU+CUDA platforms. The docker images have TVM deployed so you can launch a tuning process in the container after cloning Lorien. The docker image is also used for Lorien CI purpose.

Documentation

https://awslabs.github.io/lorien/

Citing Lorien

If you use Lorien in a scientific publication, please cite the following paper:

Cody Hao Yu, Xingjian Shi, Haichen Shen, Zhi Chen, Mu Li, Yida Wang, "Lorien: Efficient Deep Learning Workloads Delivery", Proceedings of the 12th ACM Symposium on Cloud Computing. 2021.

@inproceedings{yu2021lorien,
  title={Lorien: Efficient Deep Learning Workloads Delivery},
  author={Yu, Cody Hao and Shi, Xingjian and Shen, Haichen and Chen, Zhi and Li, Mu and Wang, Yida},
  booktitle={Proceedings of the Seventh ACM Symposium on Cloud Computing},
  year={2021}
}
Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

SW-CV-ModelZoo Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset Framework: TF/Keras 2.7 Training SQLite D

20 Dec 27, 2022
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

TorchMultimodal (Alpha Release) Introduction TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Meta Research 663 Jan 06, 2023
Learned Token Pruning for Transformers

LTP: Learned Token Pruning for Transformers Check our paper for more details. Installation We follow the same installation procedure as the original H

Sehoon Kim 52 Dec 29, 2022
Finite Element Analysis

FElupe - Finite Element Analysis FElupe is a Python 3.6+ finite element analysis package focussing on the formulation and numerical solution of nonlin

Andreas D. 20 Jan 09, 2023
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
Code for the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness"

DU-VAE This is the pytorch implementation of the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness" Acknowledgement

Dazhong Shen 4 Oct 19, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

Xiaotao Gu 146 Dec 22, 2022
JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

jumpdiff jumpdiff is a python library with non-parametric Nadaraya─Watson estimators to extract the parameters of jump-diffusion processes. With jumpd

Rydin 28 Dec 10, 2022
Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Transformer Quantization This repository contains the implementation and experiments for the paper presented in Yelysei Bondarenko1, Markus Nagel1, Ti

83 Dec 30, 2022
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

RSPNet Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning" [Suppleme

35 Jun 24, 2022
Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

ISMIR-musicTheoryTutorial This repository has slides and Jupyter notebooks for the ISMIR 2021 tutorial Scales, Chords, and Cadences: Practical Music T

Johanna Devaney 58 Oct 11, 2022
Composing methods for ML training efficiency

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training.

MosaicML 2.8k Jan 08, 2023
A set of examples around hub for creating and processing datasets

Examples for Hub - Dataset Format for AI A repository showcasing examples of using Hub Uploading Dataset Places365 Colab Tutorials Notebook Link Getti

Activeloop 11 Dec 14, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
MG-GCN: Scalable Multi-GPU GCN Training Framework

MG-GCN MG-GCN: multi-GPU GCN training framework. For more information, please read our paper. After cloning our repository, run git submodule update -

Translational Data Analytics (TDA) Lab @GaTech 6 Oct 24, 2022
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking 🎾 Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

Ruidan He 146 Nov 29, 2022
AI4Good project for detecting waste in the environment

Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in

108 Dec 25, 2022
An implementation for Neural Architecture Search with Random Labels (CVPR 2021 poster) on Pytorch.

Neural Architecture Search with Random Labels(RLNAS) Introduction This project provides an implementation for Neural Architecture Search with Random L

18 Nov 08, 2022