Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Last update: Nov 18, 2022

Overview

opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in single thread on CPU. PyTorch's torch.einsum works for both CPU and CUDA tensors. However, since there is no virtual CUDA memory, torch.einsum will run out of CUDA memory for large tensors.

This code aims at implementing a memory-efficient einsum function using PyTorch as the backend. This code also uses the opt_einsum package to optimizes the contraction path to achieve the minimal FLOPS.

Usage

from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

The resulting tensor result will be a PyTorch CPU tensor. You could convert it into numpy array by simply calling result.numpy().

Future works

Support multiple GPUs.
Memory efficient einsum kernels.
CUDA data transfer profilers.

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

1.4k Dec 25, 2022

InvTorch: memory-efficient models with invertible functions

InvTorch: Memory-Efficient Invertible Functions This module extends the functionality of torch.utils.checkpoint.checkpoint to work with invertible fun

12 May 12, 2022

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

2 Jan 4, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Initial release of the package.
Source code(tar.gz)
Source code(zip)

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Related tags

Overview

opt-einsum-torch

Usage

Future works

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A memory-efficient implementation of DenseNets

InvTorch: memory-efficient models with invertible functions

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

GNPy: Optical Route Planning and DWDM Network Optimization

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Owner

Haoyan Huo

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Official repository for "Intriguing Properties of Vision Transformers" (2021)

A curated list of awesome Active Learning

Multivariate Boosted TRee

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

Toward Spatially Unbiased Generative Models (ICCV 2021)

Python and Julia in harmony.

Official repository of Semantic Image Matting

Fake videos detection by tracing the source using video hashing retrieval.

EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

Laplace Redux -- Effortless Bayesian Deep Learning

Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks