Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Last update: Dec 02, 2022

Related tags

Deep Learning NoisyNaturalGradient

Overview

Noisy Natural Gradient as Variational Inference

PyTorch implementation of Noisy Natural Gradient as Variational Inference.

Requirements

Python 3
Pytorch
visdom

Comments

This paper is about how to optimize bayesian neural network which has matrix variate gaussian distribution.
This implementation contains Noisy Adam optimizer which is for Fully Factorized Gaussian(FFG) distribution, and Noisy KFAC optimizer which is for Matrix Variate Gaussian(MVG) distribution.
These optimizers only work with bayesian network which has specific structure that I will mention below.
Currently only linear layer is available.

Experimental comments

I addded a lr scheduler to noisy KFAC because loss is exploded during training. I guess this happens because of slight approximation.
For MNIST training noisy KFAC is 15-20x slower than noisy Adam, as mentioned in paper.
I guess the noisy KFAC needs more epochs to train simple neural network structure like 2 linear layers.

Usage

Currently only MNIST dataset are currently supported, and only fully connected layer is implemented.

Options

model : Fully Factorized Gaussian(FFG) or Matrix Variate Gaussian(MVG)
n : total train dataset size. need this value for optimizer.
eps : parameter for optimizer. Default to 1e-8.
initial_size : initial input tensor size. Default to 784, size of MNIST data.
label_size : label size. Default to 10, size of MNIST label.

More details in option_parser.py

Train

$ python train.py --model=FFG --batch_size=100 --lr=1e-3 --dataset=MNIST
$ python train.py --model=MVG --batch_size=100 --lr=1e-2 --dataset=MNIST --n=60000

Visualize

To visualize intermediate results and loss plots, run python -m visdom.server and go to the URL http://localhost:8097

Test

$ python test.py --epoch=20

Training Graphs

1. MNIST

network is consist of 2 linear layers.
FFG optimized by noisy Adam : epoch 20, lr 1e-3

MVG optimized by noisy KFAC : epoch 100, lr 1e-2, decay 0.1 for every 30 epochs
Need to tune learning rate.

Implementation detail

Optimizing parameter procedure is consists of 2 steps, Calculating gradient and Applying to bayeisan parameters.
Before forward, network samples parameters with means & variances.
Usually calling step function updates parameters, but not this case. After calling step function, you have to update bayesian parameters. Look at the ffg_model.py

TODOs

More benchmark cases
Supports bayesian convolution
Implement Block Tridiagonal Covariance, which is dependent between layers.

Code reference

Visualization code(visualizer.py, utils.py) references to pytorch-CycleGAN-and-pix2pix(https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) by Jun-Yan Zhu

Author

Tony Kim

Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Related tags

Overview

Noisy Natural Gradient as Variational Inference

Requirements

Comments

Experimental comments

Usage

Options

Train

Visualize

Test

Training Graphs

1. MNIST

Implementation detail

TODOs

Code reference

Author

Owner

Tony JiHyun Kim

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

CVAT is free, online, interactive video and image annotation tool for computer vision

Build Low Code Automated Tensorflow, What-IF explainable models in just 3 lines of code.

GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

This is the latest version of the PULP SDK

[ICLR2021oral] Rethinking Architecture Selection in Differentiable NAS

Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

A general python framework for visual object tracking and video object segmentation, based on PyTorch

An off-line judger supporting distributed problem repositories

Pytorch implementation of XRD spectral identification from COD database

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution