AdamW optimizer and cosine learning rate annealing with restarts

This repository contains an implementation of AdamW optimization algorithm and cosine learning rate scheduler described in "Decoupled Weight Decay Regularization". AdamW implementation is straightforward and does not differ much from existing Adam implementation for PyTorch, except that it separates weight decaying from batch gradient calculations. Cosine annealing scheduler with restarts allows model to converge to a (possibly) different local minimum on every restart and normalizes weight decay hyperparameter value according to the length of restart period. Unlike schedulers presented in standard PyTorch scheduler suite this scheduler adjusts optimizer's learning rate not on every epoch, but on every batch update, according to the paper.

Cyclical Learning Rates

Besides "cosine" and "arccosine" policies (arccosine has steeper profile at the limiting points), there are "triangular", triangular2 and exp_range, which implement policies proposed in "Cyclical Learning Rates for Training Neural Networks". The ratio of increasing and decreasing phases for triangular policy could be adjusted with triangular_step parameter. Minimum allowed lr is adjusted by min_lr parameter.

triangular schedule is enabled by passing policy="triangular" parameter.
triangular2 schedule reduces maximum lr by half on each restart cycle and is enabled by passing policy="triangular2" parameter, or by combining parameters policy="triangular", eta_on_restart_cb=ReduceMaxLROnRestart(ratio=0.5). The ratio parameter regulates the factor by which lr is scaled on each restart.
exp_range schedule is enabled by passing policy="exp_range" parameter. It exponentially scales maximum lr depending on iteration count. The base of exponentiation is set by gamma parameter.

These schedules could be combined with shrinking/expanding restart periods, weight decay normalization and could be used with AdamW and other PyTorch optimizers.

Example:

    batch_size = 32
    epoch_size = 1024
    model = resnet()
    optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=1e-5)
    scheduler = CyclicLRWithRestarts(optimizer, batch_size, epoch_size, restart_period=5, t_mult=1.2, policy="cosine")
    for epoch in range(100):
        scheduler.step()
        train_for_every_batch(...)
            ...
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            scheduler.batch_step()
        validate(...)

AdamW optimizer and cosine learning rate annealing with restarts

Related tags

Overview

AdamW optimizer and cosine learning rate annealing with restarts

Cyclical Learning Rates

Example:

Owner

Maksym Pyrozhok

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Codebase for Diffusion Models Beat GANS on Image Synthesis.

This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints

Public Models considered for emotion estimation from EEG

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

Distinguishing Commercial from Editorial Content in News

Multiple-criteria decision-making (MCDM) with Electre, Promethee, Weighted Sum and Pareto

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

Pytorch implementation of RED-SDS (NeurIPS 2021).

ThunderSVM: A Fast SVM Library on GPUs and CPUs

A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Feature board for ERPNext

PyTorch Implementation of Sparse DETR

Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

3.8% and 18.3% on CIFAR-10 and CIFAR-100