ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Related tags

Deep Learningactnn
Overview

ActNN : Activation Compressed Training

This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training by Jianfei Chen*, Lianmin Zheng*, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, and Joseph E. Gonzalez.

TL; DR. ActNN is a PyTorch library for memory-efficient training. It reduces the training memory footprint by compressing the saved activations. ActNN is implemented as a collection of memory-saving layers. These layers have an identical interface to their PyTorch counterparts.

Abstract

The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12×, and it enables training with a 6.6× to 14× larger batch size.

mem_speed_r50 Batch size vs. training throughput on ResNet-50. Red cross mark means out-of-memory. The shaded yellow region denotes the possible batch sizes with full precision training. ActNN achieves significantly larger maximum batch size over other state-of-the-art systems and displays a nontrivial trade-off curve.

Install

  • Requirements
torch>=1.7.1
torchvision>=0.8.2
  • Build
cd actnn
pip install -v -e .

Usage

mem_speed_benchmark/train.py is an example on using ActNN for models from torchvision.

Basic Usage

  • Step1: Configure the optimization level
    ActNN provides several optimization levels to control the trade-off between memory saving and computational overhead. You can set the optimization level by
import actnn
# available choices are ["L0", "L1", "L2", "L3", "L4", "L5"]
actnn.set_optimization_level("L3")

See set_optimization_level for more details.

  • Step2: Convert the model to use ActNN's layers.
model = actnn.QModule(model)

Note:

  1. Convert the model before calling .cuda().
  2. Set the optimization level before invoking actnn.QModule or constructing any ActNN layers.
  3. Automatic model conversion only works with standard PyTorch layers. Please use the modules (nn.Conv2d, nn.ReLU, etc.), not the functions (F.conv2d, F.relu).
  • Step3: Print the model to confirm that all the modules (Conv2d, ReLU, BatchNorm) are correctly converted to ActNN layers.
print(model)    # Should be actnn.QConv2d, actnn.QBatchNorm2d, etc.

Advanced Features

  • Convert the model manually.
    ActNN is implemented as a collection of memory-saving layers, including actnn.QConv1d, QConv2d, QConv3d, QConvTranspose1d, QConvTranspose2d, QConvTranspose3d, QBatchNorm1d, QBatchNorm2d, QBatchNorm3d, QLinear, QReLU, QSyncBatchNorm, QMaxPool2d. These layers have identical interface to their PyTorch counterparts. You can construct the model manually using these layers as the building blocks. See ResNetBuilder and resnet_configs in image_classification/image_classification/resnet.py for example.
  • (Optional) Change the data loader
    If you want to use per-sample gradient information for adaptive quantization, you have to update the dataloader to return sample indices. You can see train_loader in mem_speed_benchmark/train.py for example. In addition, you have to update the configurations.
from actnn import config, QScheme
config.use_gradient = True
QScheme.num_samples = 1300000   # the size of training set

You can find sample code in the above script.

Examples

Benchmark Memory Usage and Training Speed

See mem_speed_benchmark. Please do NOT measure the memory usage by nvidia-smi. nvidia-smi reports the size of the memory pool allocated by PyTorch, which can be much larger than the size of acutal used memory.

Image Classification

See image_classification

Object Detection, Semantic Segmentation, Self-Supervised Learning, ...

Here is the example memory-efficient training for ResNet50, built upon the OpenMMLab toolkits. We use ActNN with the default optimization level (L3). Our training runs are available at Weights & Biases.

Installation

  1. Install mmcv
export MMCV_ROOT=/path/to/clone/actnn-mmcv
git clone https://github.com/DequanWang/actnn-mmcv $MMCV_ROOT
cd $MMCV_ROOT
MMCV_WITH_OPS=1 MMCV_WITH_ORT=0 pip install -e .
  1. Install mmdet, mmseg, mmssl, ...
export MMDET_ROOT=/path/to/clone/actnn-mmdet
git clone https://github.com/DequanWang/actnn-mmdet $MMDET_ROOT
cd $MMDET_ROOT
python setup.py develop
export MMSEG_ROOT=/path/to/clone/actnn-mmseg
git clone https://github.com/DequanWang/actnn-mmseg $MMSEG_ROOT
cd $MMSEG_ROOT
python setup.py develop
export MMSSL_ROOT=/path/to/clone/actnn-mmssl
git clone https://github.com/DequanWang/actnn-mmssl $MMSSL_ROOT
cd $MMSSL_ROOT
python setup.py develop

Single GPU training

cd $MMDET_ROOT
python tools/train.py configs/actnn/faster_rcnn_r50_fpn_1x_coco_1gpu.py
# https://wandb.ai/actnn/detection/runs/ye0aax5s
# ActNN mAP 37.4 vs Official mAP 37.4
python tools/train.py configs/actnn/retinanet_r50_fpn_1x_coco_1gpu.py
# https://wandb.ai/actnn/detection/runs/1x9cwokw
# ActNN mAP 36.3 vs Official mAP 36.5
cd $MMSEG_ROOT
python tools/train.py configs/actnn/fcn_r50-d8_512x1024_80k_cityscapes_1gpu.py
# https://wandb.ai/actnn/segmentation/runs/159if8da
# ActNN mIoU 72.9 vs Official mIoU 73.6
python tools/train.py configs/actnn/fpn_r50_512x1024_80k_cityscapes_1gpu.py
# https://wandb.ai/actnn/segmentation/runs/25j9iyv3
# ActNN mIoU 74.7 vs Official mIoU 74.5

Multiple GPUs training

cd $MMSSL_ROOT
bash tools/dist_train.sh configs/selfsup/actnn/moco_r50_v2_bs512_e200_imagenet_2gpu.py 2
# https://wandb.ai/actnn/mmssl/runs/lokf7ydo
# https://wandb.ai/actnn/mmssl/runs/2efmbuww
# ActNN top1 67.3 vs Official top1 67.7

For more detailed guidance, please refer to the docs of mmcv, mmdet, mmseg, mmssl.

FAQ

  1. Does ActNN supports CPU training?
    Currently, ActNN only supports CUDA.

  2. Accuracy degradation / diverged training with ActNN.
    ActNN applies lossy compression to the activations. In some challenging cases, our default compression strategy might be too aggressive. In this case, you may try more conservative compression strategies (which consume more memory):

    • 4-bit per-group quantization
    actnn.set_optimization_level("L2")
    • 8-bit per-group quantization
    actnn.set_optimization_level("L2")
    actnn.config.activation_compression_bits = [8]

    If none of these works, you may report to us by creating an issue.

Correspondence

Please email Jianfei Chen and Lianmin Zheng. Any questions or discussions are welcomed!

Citation

If the actnn library is helpful in your research, please consider citing our paper:

@article{chen2021actnn,
  title={ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training},
  author={Chen, Jianfei and Zheng, Lianmin and Yao, Zhewei and Wang, Dequan and Stoica, Ion and Mahoney, Michael W and Gonzalez, Joseph E},
  journal={arXiv preprint arXiv:2104.14129},
  year={2021}
}
Owner
UC Berkeley RISE
REAL-TIME INTELLIGENT SECURE EXPLAINABLE SYSTEMS
UC Berkeley RISE
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
Six - a Python 2 and 3 compatibility library

Six is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the g

Benjamin Peterson 919 Dec 28, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

shzhang 59 Dec 10, 2022
Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

ConversorDeMedidas_CapuccinoGelado Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini! Requirem

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
《Fst Lerning of Temporl Action Proposl vi Dense Boundry Genertor》(AAAI 2020)

Update 2020.03.13: Release tensorflow-version and pytorch-version DBG complete code. 2019.11.12: Release tensorflow-version DBG inference code. 2019.1

Tencent 338 Dec 16, 2022
Creating predictive checklists from data using integer programming.

Learning Optimal Predictive Checklists A Python package to learn simple predictive checklists from data subject to customizable constraints. For more

Healthy ML 5 Apr 19, 2022
Build Graph Nets in Tensorflow

Graph Nets library Graph Nets is DeepMind's library for building graph networks in Tensorflow and Sonnet. Contact DeepMind 5.2k Jan 05, 2023

YOLOv4-v3 Training Automation API for Linux

This repository allows you to get started with training a state-of-the-art Deep Learning model with little to no configuration needed! You provide your labeled dataset or label your dataset using our

BMW TechOffice MUNICH 626 Dec 31, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

74 Dec 15, 2022
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 73 Dec 24, 2022
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021
Source code for PairNorm (ICLR 2020)

PairNorm Official pytorch source code for PairNorm paper (ICLR 2020) This code requires pytorch_geometric=1.3.2 usage For SGC, we use original PairNo

62 Dec 08, 2022
Revisiting Global Statistics Aggregation for Improving Image Restoration

Revisiting Global Statistics Aggregation for Improving Image Restoration Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu Paper: https://arxiv.org/pd

MEGVII Research 128 Dec 24, 2022
Custom studies about block sparse attention.

Block Sparse Attention 研究总结 本人近半年来对Block Sparse Attention(块稀疏注意力)的研究总结(持续更新中)。按时间顺序,主要分为如下三部分: PyTorch 自定义 CUDA 算子——以矩阵乘法为例 基于 Triton 的 Block Sparse A

Chen Kai 2 Jan 09, 2022
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

Nicolas Girard 186 Jan 04, 2023
Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Introduction This repository is for Deep Learning agent of Starcraft2. It is very similar to AlphaStar of DeepMind except size of network. I only test

Dohyeong Kim 136 Jan 04, 2023
Highly comparative time-series analysis

〰️ hctsa 〰️ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu

Ben Fulcher 569 Dec 21, 2022