Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Last update: Sep 16, 2022

Related tags

Deep Learning bigBatch

Overview

Train longer, generalize better - Big batch training

This is a code repository used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks" By Elad Hoffer, Itay Hubara and Daniel Soudry.

It is based off convNet.pytorch with some helpful options such as:

Training on several datasets
Complete logging of trained experiment
Graph visualization of the training/validation loss and accuracy
Definition of preprocessing and optimization regime for each model

Dependencies

pytorch
torchvision to load the datasets, perform image transforms
pandas for logging to csv
bokeh for training visualization

Data

Configure your dataset path at data.py.
To get the ILSVRC data, you should register on their site for access: http://www.image-net.org/

Experiment examples

python main_normal.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_lr_fix --epochs 100 --b 2048 --lr_bb_fix;
python main_normal.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_regime_adaptation --epochs 100 --b 2048 --lr_bb_fix --regime_bb_fix;
python main_gbn.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_ghost_bn256 --epochs 100 --b 2048 --lr_bb_fix --mini-batch-size 256;
python main_normal.py --dataset cifar100 --model resnet --save cifar100_wresnet16_4_bs1024_regime_adaptation --epochs 100 --b 1024 --lr_bb_fix --regime_bb_fix;
python main_gbn.py --model mnist_f1 --dataset mnist --save mnist_baseline_bs4096_gbn --epochs 50 --b 4096 --lr_bb_fix --no-regime_bb_fix --mini-batch-size 128;

See run_experiments.sh for more examples

Model configuration

Network model is defined by writing a .py file in models folder, and selecting it using the model flag. Model function must be registered in models/__init__.py The model function must return a trainable network. It can also specify additional training options such optimization regime (either a dictionary or a function), and input transform modifications.

e.g for a model definition:

class Model(nn.Module):

    def __init__(self, num_classes=1000):
        super(Model, self).__init__()
        self.model = nn.Sequential(...)

        self.regime = {
            0: {'optimizer': 'SGD', 'lr': 1e-2,
                'weight_decay': 5e-4, 'momentum': 0.9},
            15: {'lr': 1e-3, 'weight_decay': 0}
        }

        self.input_transform = {
            'train': transforms.Compose([...]),
            'eval': transforms.Compose([...])
        }
    def forward(self, inputs):
        return self.model(inputs)

 def model(**kwargs):
        return Model()

Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Related tags

Overview

Train longer, generalize better - Big batch training

Dependencies

Data

Experiment examples

Model configuration

Owner

Elad Hoffer

Code for the Lovász-Softmax loss (CVPR 2018)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

A tool to prepare websites grabbed with wget for local viewing.

Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

Google AI Open Images - Object Detection Track: Open Solution

Differentiable scientific computing library

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Learn about quantum computing and algorithm on quantum computing

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias