Secure Distributed Training at Scale

Related tags

Deep Learningbtard
Overview

Secure Distributed Training at Scale

This repository contains the implementation of experiments from the paper

"Secure Distributed Training at Scale"

Eduard Gorbunov*, Alexander Borzunov*, Michael Diskin, Max Ryabinin

[PDF] arxiv.org

Overview

The code is organized as follows:

  • ./resnet is a setup for training ResNet18 on CIFAR-10 with simulated byzantine attackers
  • ./albert runs distributed training of ALBERT-large with byzantine attacks using cloud instances

ResNet18

This setup uses torch.distributed for parallelism.

Requirements
  • Python >= 3.7 (we recommend Anaconda python 3.8)
  • Dependencies: pip install jupyter torch>=1.6.0 torchvision>=0.7.0 tensorboard
  • A machine with at least 16GB RAM and either a GPU with >24GB memory or 3 GPUs with at least 10GB memory each.
  • We tested the code on Ubuntu Server 18.04, it should work with all major linux distros. For Windows, we recommend using Docker (e.g. via Kitematic).

Running experiments: please open ./resnet/RunExperiments.ipynb and follow the instructions in that notebook. The learning curves will be available in Tensorboard logs: tensorboard --logdir btard/resnet.

ALBERT

This setup spawns distributed nodes that collectively train ALBERT-large on wikitext103. It uses a version of the hivemind library modified so that some peers may be programmed to become Byzantine and perform various types of attacks on the training process.

Requirements
  • The experiments are optimized for 16 instances each with a single T4 GPU.

    • For your convenience, we provide a cost-optimized AWS starter notebook that can run experiments (see below)
    • While it can be simulated with a single node, doing so will require additional tuning depending on the number and type of GPUs available.
  • If running manually, please install the core library on each machine:

    • The code requires python >= 3.7 (we recommend Anaconda python 3.8)
    • Install the library: cd ./albert/hivemind/ && pip install -e .
    • If successful, it should become available as import hivemind

Running experiments: For your convenience, we provide a unified script that runs a distributed ALBERT experiment in the AWS cloud ./albert/experiments/RunExperiments.ipynb using preemptible T4 instances. The learning curves will be posted to the Wandb project specified during the notebook setup.

Expected cloud costs: a training experiment with 16 hosts takes up approximately $60 per day for g4dn.xlarge and $90 per day for g4dn.2xlarge instances. One can expect a full training experiment to converge in ≈3 days. Once the model is trained, one can restart training from intermediate checkpoints and simulate attacks. One attack episode takes up 4-5 hours depending on cloud availability.

Owner
Yandex Research
Yandex Research
Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator 46 Dec 15, 2022
This is the source code of the solver used to compete in the International Timetabling Competition 2019.

ITC2019 Solver This is the source code of the solver used to compete in the International Timetabling Competition 2019. Building .NET Core (2.1 or hig

Edon Gashi 8 Jan 22, 2022
GoodNews Everyone! Context driven entity aware captioning for news images

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy! Model preview: Huge T

117 Dec 19, 2022
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 04, 2022
Self-Supervised Speech Pre-training and Representation Learning Toolkit.

What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site

s3prl 1.6k Jan 08, 2023
Point Cloud Registration Network

PCRNet: Point Cloud Registration Network using PointNet Encoding Source Code Author: Vinit Sarode and Xueqian Li Paper | Website | Video | Pytorch Imp

ViNiT SaRoDe 59 Nov 19, 2022
Dynamic Environments with Deformable Objects (DEDO)

DEDO - Dynamic Environments with Deformable Objects DEDO is a lightweight and customizable suite of environments with deformable objects. It is aimed

Rika 32 Dec 22, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Lab Materials for MIT 6.S191: Introduction to Deep Learning

This repository contains all of the code and software labs for MIT 6.S191: Introduction to Deep Learning! All lecture slides and videos are available

Alexander Amini 5.6k Dec 26, 2022
Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'

NCAE_UAD Official project repository of 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination' Abstract In this p

Jongmin Andrew Yu 2 Feb 10, 2022
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Facebook Research 94 Oct 26, 2022
This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

Huynh Ngoc Anh 1.7k Dec 24, 2022
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

English | 简体中文 Documentation: https://mmtracking.readthedocs.io/ Introduction MMTracking is an open source video perception toolbox based on PyTorch.

OpenMMLab 2.7k Jan 08, 2023
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 7 Dec 18, 2022
Realistic lighting in ursina!

Ursina Lighting Realistic lighting in ursina! If you want to have realistic lighting in ursina, import the UrsinaLighting.py in your project and use t

17 Jul 07, 2022
Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

Fuhang 5 Jan 18, 2022
RIM: Reliable Influence-based Active Learning on Graphs.

RIM: Reliable Influence-based Active Learning on Graphs. This repository is the official implementation of RIM. Requirements To install requirements:

Wentao Zhang 4 Aug 29, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022