A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

Last update: Aug 18, 2022

Overview

MADGRAD Optimization Algorithm For Tensorflow

This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization (Aaron Defazio and Samy Jelassi, 2021).

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Contributing
License
Contact
Citations

About The Project

The MadGrad algorithm of optimization uses Dual averaging of gradients along with momentum based adaptivity to attain results that match or outperform Adam or SGD + momentum based algorithms. This project offers a Tensorflow implementation of the algorithm along with a few usage examples and tests.

Prerequisites

Prerequisites can be installed separately through the requirements.txt file as below

pip install -r requirements.txt

Installation

This project is built with Python 3 and can be pip installed directly

pip install tf-madgrad

Usage

To use the optimizer in any tf.keras model, you just need to import and instantiate the MadGrad optimizer from the tf_madgrad package.

from madgrad import MadGrad

# Create the architecture
inp = tf.keras.layers.Input(shape=shape)
...
op = tf.keras.layers.Dense(classes, activation=activation)

# Instantiate the model
model = tf.keras.models.Model(inp, op)

# Pass the MadGrad optimizer to the compile function
model.compile(optimizer=MadGrad(lr=0.01), loss=loss)

# Fit the keras model as normal
model.fit(...)

This implementation is also supported for distributed training using tf.strategy

See a MNIST example here

Contributing

Any and all contributions are welcome. Please raise an issue if the optimizer gives incorrect results or crashes unexpectedly during training.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Feel free to reach out for any issues or requests related to this implementation

Darshan Deshpande - Email | LinkedIn

Citations

@misc{defazio2021adaptivity,
      title={Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization}, 
      author={Aaron Defazio and Samy Jelassi},
      year={2021},
      eprint={2101.11075},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

Related tags

Overview

MADGRAD Optimization Algorithm For Tensorflow

About The Project

Prerequisites

Installation

Usage

Contributing

License

Contact

Citations

Owner

Generalized Data Weighting via Class-level Gradient Manipulation

Unofficial PyTorch implementation of MobileViT.

an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

Permute Me Softly: Learning Soft Permutations for Graph Representations

Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Forecasting

Single-Shot Motion Completion with Transformer

Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Official implementation of Long-Short Transformer in PyTorch.

Viperdb - A tiny log-structured key-value database written in pure Python

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Utilizes Pose Estimation to offer sprinters cues based on an image of their running form.

FOSS Digital Asset Distribution Platform built on Frappe.

Code for approximate graph reduction techniques for cardinality-based DSFM, from paper

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor