Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Last update: Oct 29, 2022

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

Setup the environment

To install the required Python modules:

conda create --name py38_oktopk python=3.8

conda activate py38_oktopk

pip3 install pip==20.2.4

pip install -r requirements.txt

MPICC="cc -shared" pip install --no-binary=mpi4py mpi4py

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare Datasets

Cifar-10 for VGG

cd ./VGG/vgg_data

wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

tar -zxvf cifar-10-python.tar.gz

AN4 for LSTM

cd ./LSTM/audio_data

wget https://www.dropbox.com/s/l5w4up20u5pfjxf/an4.zip

unzip an4.zip

Wikipedia for BERT

cd ./BERT/bert/bert_data/

Prepare the dataset according to the README file.

Run jobs

We run experiments on GPU clusters with SLURM job scheduler. To evaluate the performance of Ok-Topk, Gaussiank, gtopk, topkA, topkDSA, and dense, run the jobs as follows.

To run VGG jobs

cd ./VGG

./sbatch_vgg_jobs.sh

To run LSTM jobs

cd ./LSTM

./sbatch_lstm_jobs.sh

To run BERT jobs

cd ./BERT/bert/

./sbatch_bert_jobs.sh

Publication

The work of Ok-Topk is pulished in PPoPP'22. DOI

License

See LICENSE.

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Setup the environment

Prepare Datasets

Cifar-10 for VGG

AN4 for LSTM

Wikipedia for BERT

Run jobs

To run VGG jobs

To run LSTM jobs

To run BERT jobs

Publication

License

Owner

Shigang Li

Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

MIRACLE (Missing data Imputation Refinement And Causal LEarning)

An experimental technique for efficiently exploring neural architectures.

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

ECAENet (TensorFlow and Keras)

Bounding Wasserstein distance with couplings

MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Official repository for the paper "Instance-Conditioned GAN"

The repository includes the code for training cell counting applications. (Keras + Tensorflow)

This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Geometric Deep Learning Extension Library for PyTorch

Collection of Docker images for ML/DL and video processing projects

This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).