An optimizer that trains as fast as Adam and as good as SGD.

Overview

AdaBound

PyPI - Version PyPI - Python Version PyPI - Wheel GitHub - LICENSE

An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popular tasks in the field of CV, NLP, and etc.

Based on Luo et al. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.

Quick Links

Installation

AdaBound requires Python 3.6.0 or later. We currently provide PyTorch version and AdaBound for TensorFlow is coming soon.

Installing via pip

The preferred way to install AdaBound is via pip with a virtual environment. Just run

pip install adabound

in your Python environment and you are ready to go!

Using source code

As AdaBound is a Python class with only 100+ lines, an alternative way is directly downloading adabound.py and copying it to your project.

Usage

You can use AdaBound just like any other PyTorch optimizers.

optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1)

As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of training, and gradually transforms to SGD at the end. The final_lr parameter indicates AdaBound would transforms to an SGD with this learning rate. In common cases, a default final learning rate of 0.1 can achieve relatively good and stable results on unseen data. It is not very sensitive to its hyperparameters. See Appendix G of the paper for more details.

Despite of its robust performance, we still have to state that, there is no silver bullet. It does not mean that you will be free from tuning hyperparameters once using AdaBound. The performance of a model depends on so many things including the task, the model structure, the distribution of data, and etc. You still need to decide what hyperparameters to use based on your specific situation, but you may probably use much less time than before!

Demos

Thanks to the awesome work by the GitHub team and the Jupyter team, the Jupyter notebook (.ipynb) files can render directly on GitHub. We provide several notebooks (like this one) for better visualization. We hope to illustrate the robust performance of AdaBound through these examples.

For the full list of demos, please refer to this page.

Citing

If you use AdaBound in your research, please cite Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

Contributors

@kayuksel

License

Apache 2.0

Comments
  • What is up with Epoch 150

    What is up with Epoch 150

    I'm wondering what is happening at epoch 150 in all visualizations? I would like to introduce that into all my models ;-)

    https://github.com/Luolc/AdaBound/blob/master/demos/cifar10/visualization.ipynb

    opened by kootenpv 8
  • AdaBoundW

    AdaBoundW

    An AdaBound version with decoupled weight decay, which has been implemented to the code as an additional class, as it has been discussed in the recent issue #13.

    opened by kayuksel 3
  • Question about the code

    Question about the code

    IIRC, because group['lr'] will never be changed, so finalr_lr will always be the same as group['final_lr']. Is this intended? https://github.com/Luolc/AdaBound/blob/6fa826003f41a57501bde3e2baab1488410fe2da/adabound/adabound.py#L110

    opened by crcrpar 2
  • Don't work properly with higher lr

    Don't work properly with higher lr

    I'm new in deep learning and I found the project works well with SGD but turns to be sth wrong with adabound.

    When I start with lr=1e-3, it shows as below and break down: invalid argument 2: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [1 x 64 x 0 x 27] at /pytorch/aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:24

    But seems to work right if I set lr to 1e-4 or lower. It confused me a lot. Any ideas?

    python=3.6 pytorch=1.0.1 / 0.4

    opened by Ocelot7777 0
  • Can this deal with complex numbers?

    Can this deal with complex numbers?

    Hi authors,

    I intended to use this method on complex numbers and it turned out with a error message like:

    File "optimizer.py", line 701, in step step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_( RuntimeError: "clamp_scalar_cpu" not implemented for 'ComplexFloat'

    I'm wondering if it's possible to improve this for complex numbers? Thanks.

    Ni

    opened by ni-chen 0
  • When did the optimizer switch to SGD?

    When did the optimizer switch to SGD?

    I set the initial lr=0.0001, final_lr=0.1, but I still don't know when the optimizer will become SGD. Do I need to improve my learning rate to the final learning rate manually? thanks!

    opened by yunbujian 0
  • Pytorch 1.6 warning

    Pytorch 1.6 warning

    /home/xxxx/.local/lib/python3.7/site-packages/adabound/adabound.py:94: UserWarning: This overload of add_ is deprecated:
            add_(Number alpha, Tensor other)
    Consider using one of the following signatures instead:
            add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
      exp_avg.mul_(beta1).add_(1 - beta1, grad)
    
    opened by MichaelMonashev 1
  • Learning rate changing

    Learning rate changing

    Hi, thanks a lot for sharing your excellent work.

    I wonder if I want to change learning rate with epoch increasing, how do I set parameter lr and final_lr in adamnboound ? Or is there any need changing learining rate with epoch increasing?

    Looking for your reply, thanks a lot.

    opened by EddieEduardo 0
  • LSTM hyparameters for language modeling

    LSTM hyparameters for language modeling

    Greetings,

    Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that?

    Thank you!

    opened by hoangcuong2011 0
Releases(v0.0.5)
  • v0.0.5(Mar 6, 2019)

    Bug Fixes

    • Fix wrong assertion of final_lr 02e11bae10c82f6b5365f7925c8cf71252adcd52
    • Fix .gitignore in CIFAR-10 demo to include the learning curve data 54ef9aa6c133caf0d9c82198d46979cfdbbb12f6
    Source code(tar.gz)
    Source code(zip)
Owner
LoLo
A fool living in the amazing world.
LoLo
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

Andrej 3.5k Jan 08, 2023
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022
ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Laurent Mazare 369 Jan 03, 2023
Fast Discounted Cumulative Sums in PyTorch

TODO: update this README! Fast Discounted Cumulative Sums in PyTorch This repository implements an efficient parallel algorithm for the computation of

Daniel Povey 7 Feb 17, 2022
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

README TabNet : Attentive Interpretable Tabular Learning This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attent

DreamQuark 2k Dec 27, 2022
A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

🤗 Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

Hugging Face 3.5k Jan 08, 2023
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 04, 2023
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
This is an differentiable pytorch implementation of SIFT patch descriptor.

This is an differentiable pytorch implementation of SIFT patch descriptor. It is very slow for describing one patch, but quite fast for batch. It can

Dmytro Mishkin 150 Dec 24, 2022
Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023
270 Dec 24, 2022
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 02, 2023
PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that all

Preferred Networks, Inc. 96 Nov 28, 2022
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.6k Jan 03, 2023
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Jan 07, 2023
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

878 Dec 30, 2022
Use Jax functions in Pytorch with DLPack

Use Jax functions in Pytorch with DLPack

Phil Wang 106 Dec 17, 2022