An energy estimator for eyeriss-like DNN hardware accelerator

Overview

Energy-Estimator-for-Eyeriss-like-Architecture-

An energy estimator for eyeriss-like DNN hardware accelerator

This is an energy estimator for eyeriss-like architecture utilizing Row-Stationary dataflow which is a DNN hardware accelerator created by works from Vivienne Sze’s group in MIT. You can refer to their original works in github, Y. N. Wu, V. Sze, J. S. Emer, “An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs,” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2020, http://eyeriss.mit.edu/, etc. Thanks to their contribution in DNN accelerator and energy efficient design.

image

Eyeriss-like architecture utilizes row-stationary dataflow in order to fully explore data reuse including convolutional reuse, ifmap reuse and filter reuse. In general, the energy breakdown in each DNN layer can be separated in terms of computation and memory access (or data transfer). image

Computation Energy : Performing MAC operations. Data Energy : The number of bits accessed at each memory level is calculated based on the dataflow and scaled by the hardware energy cost of accessing one bit at that memory level. The data energy is the summation of each memory hierarchy (DRAM, NoC, Global Buffer, RF) or each data type (ifmap, weight, partial sum). image

  1. Quantization Bitwidth Energy scaling in computation : linear for single operand scaling. Quadratic for two operands scaling. Energy scaling in data access : Linear scaling for any data type in any memory hierarchy.
  2. Pruning on filters (weights) Energy scaling in computation : Skip MAC operations according to pruning ratio. (Linear scaling) Energy scaling in data access : Linear scaling for weight access. image

Assumptions: Initial image input and weights in each layer should be first read from DRAM (external off-chip memory). Global Buffer is big enough to store any amount of datum and intermediate numbers. NoC has high-performance and high throughput with non-blocking broadcasting and inter-PE forwarding capability which supports multiple information transactions simultaneously. No data compression technique is considered in estimator design. Each PE is able to recognize information transferred among NoCs so that only those in need could receive data. Sparsity of weights and activations aren’t considered. Register File inside each PE only has the capacity to store one row of weights, one row of ifmap and one partial sum which means that we won’t take the capacity of RF into account. (A pity in this energy estimator) Ifmap and ofmap of each layer should be read from or written back into DRAM for external read operations. Once a data value is read from one memory level and then written into another memory level, the energy consumption of this transaction is always decided by the higher-cost level and only regarded as a single operation. Data transfer could happen directly between any 2 memory levels. This estimator is only applied to 2D systolic PE arrays. Partial sum and ofmap of one layer have the same bitwidth as activations. Maxpooling, Relu and LRN are not taken into account with respect to energy estimation. (little impact on total estimation) In order to make full use of data reuse (convolutional reuse and ifmap reuse), apart from row-stationary dataflow, scheduling algorithm will try to avoid reading ifmaps as much as possible. Once a channel of ifmap is kept inside the RF, the computation will be executed across the corresponding channel of entire filters in each layer.

Example analysis : Hardware Architecture : Eyeriss PE size : 12*14, 2D Dataflow : Row Stationary DNN Model : AlexNet (5 conv layers, 3 FC layers) Initial Input : single image from ImageNet Additional Attributes : Pruning and Quantization (You can revise your own pruning ratio and bitwidth of weight and activation in my source code) Everything is not hard-coded !

A pity ! (future works to do) 3D PE arrays. Memory size is considered in scheduling algorithm to accommodate more intermediate datum in low-cost level without writing back to high-cost level. Possible I/O data compression. (encoder, decoder) Possible sparsity optimization. (zero-gated MAC) Elaborate operation with specific arguments like random read, repeated write, constant read, etc. The impact of memory size, spatial distribution, location can be taken into account when we try to improve precision of our energy estimator. For example, the spatial distribution between two PEs can be characterized by Manhattan distance which can be used to scale the energy consumption of data forwarding in NoC.

If you have any questions or troubles please contact me. I'd also like to listen to your advice and opinions!

Owner
HEXIN BAO
UESTC Bachelor EE NUS Master ECE Future unknown
HEXIN BAO
This is an official pytorch implementation of Fast Fourier Convolution.

Fast Fourier Convolution (FFC) for Image Classification This is the official code of Fast Fourier Convolution for image classification on ImageNet. Ma

pkumi 199 Jan 03, 2023
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

CoCLR: Self-supervised Co-Training for Video Representation Learning This repository contains the implementation of: InfoNCE (MoCo on videos) UberNCE

Tengda Han 271 Jan 02, 2023
Models, datasets and tools for Facial keypoints detection

Template for Data Science Project This repo aims to give a robust starting point to any Data Science related project. It contains readymade tools setu

girafe.ai 1 Feb 11, 2022
Image Data Augmentation in Keras

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Grace Ugochi Nneji 3 Feb 15, 2022
A collection of scripts I developed for personal and working projects.

A collection of scripts I developed for personal and working projects Table of contents Introduction Repository diagram structure List of scripts pyth

Gianluca Bianco 109 Dec 26, 2022
Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Jun Chen 139 Dec 21, 2022
OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

Introduction English | 简体中文 MMHuman3D is an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and comput

OpenMMLab 782 Jan 04, 2023
Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)

DQC: Differentiable Quantum Chemistry Differentiable quantum chemistry package. Currently only support differentiable density functional theory (DFT)

75 Dec 02, 2022
Image restoration with neural networks but without learning.

Warning! The optimization may not converge on some GPUs. We've personally experienced issues on Tesla V100 and P40 GPUs. When running the code, make s

Dmitry Ulyanov 7.4k Jan 01, 2023
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 09, 2023
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 08, 2022
《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

41 Oct 27, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

Jiachen Xu 5 Jul 14, 2022
Using Python to Play Cyberpunk 2077

CyberPython 2077 Using Python to Play Cyberpunk 2077 This repo will contain code from the Cyberpython 2077 video series on Youtube (youtube.

Harrison 118 Oct 18, 2022