Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Overview

Noisy Natural Gradient as Variational Inference

PyTorch implementation of Noisy Natural Gradient as Variational Inference.

Requirements

  • Python 3
  • Pytorch
  • visdom

Comments

  • This paper is about how to optimize bayesian neural network which has matrix variate gaussian distribution.
  • This implementation contains Noisy Adam optimizer which is for Fully Factorized Gaussian(FFG) distribution, and Noisy KFAC optimizer which is for Matrix Variate Gaussian(MVG) distribution.
  • These optimizers only work with bayesian network which has specific structure that I will mention below.
  • Currently only linear layer is available.

Experimental comments

  • I addded a lr scheduler to noisy KFAC because loss is exploded during training. I guess this happens because of slight approximation.
  • For MNIST training noisy KFAC is 15-20x slower than noisy Adam, as mentioned in paper.
  • I guess the noisy KFAC needs more epochs to train simple neural network structure like 2 linear layers.

Usage

Currently only MNIST dataset are currently supported, and only fully connected layer is implemented.

Options

  • model : Fully Factorized Gaussian(FFG) or Matrix Variate Gaussian(MVG)
  • n : total train dataset size. need this value for optimizer.
  • eps : parameter for optimizer. Default to 1e-8.
  • initial_size : initial input tensor size. Default to 784, size of MNIST data.
  • label_size : label size. Default to 10, size of MNIST label.

More details in option_parser.py

Train

$ python train.py --model=FFG --batch_size=100 --lr=1e-3 --dataset=MNIST
$ python train.py --model=MVG --batch_size=100 --lr=1e-2 --dataset=MNIST --n=60000

Visualize

  • To visualize intermediate results and loss plots, run python -m visdom.server and go to the URL http://localhost:8097

Test

$ python test.py --epoch=20

Training Graphs

1. MNIST

  • network is consist of 2 linear layers.
  • FFG optimized by noisy Adam : epoch 20, lr 1e-3

  • MVG optimized by noisy KFAC : epoch 100, lr 1e-2, decay 0.1 for every 30 epochs
  • Need to tune learning rate.

Implementation detail

  • Optimizing parameter procedure is consists of 2 steps, Calculating gradient and Applying to bayeisan parameters.
  • Before forward, network samples parameters with means & variances.
  • Usually calling step function updates parameters, but not this case. After calling step function, you have to update bayesian parameters. Look at the ffg_model.py

TODOs

  • More benchmark cases
  • Supports bayesian convolution
  • Implement Block Tridiagonal Covariance, which is dependent between layers.

Code reference

Visualization code(visualizer.py, utils.py) references to pytorch-CycleGAN-and-pix2pix(https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) by Jun-Yan Zhu

Author

Tony Kim

Owner
Tony JiHyun Kim
CEO/Tech Lead @PostAlpine Co., Ltd.
Tony JiHyun Kim
QueryFuzz implements a metamorphic testing approach to test Datalog engines.

Datalog is a popular query language with applications in several domains. Like any complex piece of software, Datalog engines may contain bugs. The mo

34 Sep 10, 2022
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

Surge AI 134 Dec 19, 2022
rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

rastrainer rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle. UI TODO Init UI. Add Block. Add l

deepbands 5 Mar 04, 2022
Predicting Price of house by considering ,house age, Distance from public transport

House-Price-Prediction Predicting Price of house by considering ,house age, Distance from public transport, No of convenient stores around house etc..

Musab Jaleel 1 Jan 08, 2022
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

1.3k Jan 02, 2023
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
modelvshuman is a Python library to benchmark the gap between human and machine vision

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with

Bethge Lab 244 Jan 03, 2023
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
Meta-meta-learning with evolution and plasticity

Evolve plastic networks to be able to automatically acquire novel cognitive (meta-learning) tasks

5 Jun 28, 2022
Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

What is Detectron2-FC Detectron2-FC a fast construction platform of neural network algorithm based on detectron2. We have been working hard in two dir

董晋宗 9 Jun 06, 2022
A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

Cristian Garcia 58 Nov 23, 2022
A high-performance distributed deep learning system targeting large-scale and automated distributed training.

HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop

DAIR Lab 150 Dec 21, 2022
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 03, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
[ICCV2021] Learning to Track Objects from Unlabeled Videos

Unsupervised Single Object Tracking (USOT) 🌿 Learning to Track Objects from Unlabeled Videos Jilai Zheng, Chao Ma, Houwen Peng and Xiaokang Yang 2021

53 Dec 28, 2022
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023