Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Overview

PyVarInf

PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference.

Bayesian Deep Learning with Variational Inference

Bayesian Deep Learning

Assume we have a dataset D = {(x1, y1), ..., (xn, yn)} where the x's are the inputs and the y's the outputs. The problem is to predict the y's from the x's. Further assume that p(D|θ) is the output of a neural network with weights θ. The network loss is defined as

Usually, when training a neural network, we try to find the parameter θ* which minimizes Ln(θ).

In Bayesian Inference, the problem is instead to study the posterior distribution of the weights given the data. Assume we have a prior α over ℝd. The posterior is

This can be used for model selection, or prediction with Bayesian Model Averaging.

Variational Inference

It is usually impossible to analytically compute the posterior distribution, especially with models as complex as neural networks. Variational Inference adress this problem by approximating the posterior p(θ|D) by a parametric distribution q(θ|φ) where φ is a parameter. The problem is then not to learn a parameter θ* but a probability distribution q(θ|φ) minimizing

F is called the variational free energy.

This idea was originally introduced for deep learning by Hinton and Van Camp [5] as a way to use neural networks for Minimum Description Length [3]. MDL aims at minimizing the number of bits used to encode the whole dataset. Variational inference introduces one of many data encoding schemes. Indeed, F can be interpreted as the total description length of the dataset D, when we first encode the model, then encode the part of the data not explained by the model:

  • LC(φ) = KL(q(.|φ)||α) is the complexity loss. It measures (in nats) the quantity of information contained in the model. It is indeed possible to encode the model in LC(φ) nats, with the bits-back code [4].
  • LE(φ) = Eθ ~ q(θ|φ)[Ln(θ)] is the error loss. It measures the necessary quantity of information for encoding the data D with the model. This code length can be achieved with a Shannon-Huffman code for instance.

Therefore F(φ) = LC(φ) + LE(φ) can be rephrased as an MDL loss function which measures the total encoding length of the data.

Practical Variational Optimisation

In practice, we define φ = (µ, σ) in ℝd x ℝd, and q(.|φ) = N(µ, Σ) the multivariate distribution where Σ = diag(σ12, ..., σd2), and we want to find the optimal µ* and σ*.

With this choice of a gaussian posterior, a Monte Carlo estimate of the gradient of F w.r.t. µ and σ can be obtained with backpropagation. This allows to use any gradient descent method used for non-variational optimisation [2]

Overview of PyVarInf

The core feature of PyVarInf is the Variationalize function. Variationalize takes a model as input and outputs a variationalized version of the model with gaussian posterior.

Definition of a variational model

To define a variational model, first define a traditional PyTorch model, then use the Variationalize function :

import pyvarinf
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.bn1 = nn.BatchNorm2d(10)
        self.bn2 = nn.BatchNorm2d(20)

    def forward(self, x):
        x = self.bn1(F.relu(F.max_pool2d(self.conv1(x), 2)))
        x = self.bn2(F.relu(F.max_pool2d(self.conv2(x), 2)))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x)

model = Net()
var_model = pyvarinf.Variationalize(model)
var_model.cuda()

Optimisation of a variational model

Then, the var_model can be trained that way :

optimizer = optim.Adam(var_model.parameters(), lr=0.01)

def train(epoch):
    var_model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = var_model(data)
        loss_error = F.nll_loss(output, target)
	# The model is only sent once, thus the division by
	# the number of datapoints used to train
        loss_prior = var_model.prior_loss() / 60000
        loss = loss_error + loss_prior
        loss.backward()
        optimizer.step()

for epoch in range(1, 500):
    train(epoch)

Available priors

In PyVarInf, we have implemented four families of priors :

Gaussian prior

The gaussian prior is N(0,Σ), with Σ the diagonal matrix diag(σ12, ..., σd2) defined such that 1/σi is the square root of the number of parameters in the layer, following the standard initialisation of neural network weights. It is the default prior, and do not have any parameter. It can be set with :

var_model.set_prior('gaussian')

Conjugate priors

The conjugate prior is used if we assume that all the weights in a given layer should be distributed as a gaussian, but with unknown mean and variance. See [6] for more details. This prior can be set with

var_model.set_prior('conjugate', n_mc_samples, alpha_0, beta_0, mu_0, kappa_0)

There are five parameters that have to bet set :

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mu_0, the prior sample mean
  • kappa_0, the number of samples used to estimate the prior sample mean
  • alpha_0 and beta_0, such that variance was estimated from 2 alpha_0 observations with sample mean mu_0 and sum of squared deviations 2 beta_0

Conjugate prior with known mean

The conjugate prior with known mean is similar to the conjugate prior. It is used if we assume that all the weights in a given layer should be distributed as a gaussian with a known mean but unknown variance. It is usefull in neural networks model when we assume that the weights in a layer should have mean 0. See [6] for more details. This prior can be set with :

var_model.set_prior('conjugate_known_mean', n_mc_samples, mean, alpha_0, beta_0)

Four parameters have to be set:

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mean, the known mean
  • alpha_0 and beta_0 defined as above

Mixture of two gaussian

The idea of using a mixture of two gaussians is defined in [1]. This prior can be set with:

var_model.set_prior('mixtgauss', n_mc_samples, sigma_1, sigma_2, pi)
  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • sigma_1 and sigma_2 the std of the two gaussians
  • pi the probability of the first gaussian

Requirements

This module requires Python 3. You need to have PyTorch installed for PyVarInf to work (as PyTorch is not readily available on PyPi). To install PyTorch, follow the instructions described here.

References

  • [1] Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, and Wierstra, Daan. Weight Uncertainty in Neural Networks. In International Conference on Machine Learning, pp. 1613–1622, 2015.
  • [2] Graves, Alex. Practical Variational Inference for Neural Networks. In Neural Information Processing Systems, 2011.
  • [3] Grünwald, Peter D. The Minimum Description Length principle. MIT press, 2007.
  • [4] Honkela, Antti and Valpola, Harri. Variational Learning and Bits-Back Coding: An Information-Theoretic View to Bayesian Learning. IEEE transactions on Neural Networks, 15(4), 2004.
  • [5] Hinton, Geoffrey E and Van Camp, Drew. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. In Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • [6] Murphy, Kevin P. Conjugate Bayesian analysis of the Gaussian distribution., 2007.
HuSpaCy: industrial-strength Hungarian natural language processing

HuSpaCy: Industrial-strength Hungarian NLP HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing faciliti

HuSpaCy 120 Dec 14, 2022
Boundary-aware Transformers for Skin Lesion Segmentation

Boundary-aware Transformers for Skin Lesion Segmentation Introduction This is an official release of the paper Boundary-aware Transformers for Skin Le

Jiacheng Wang 79 Dec 16, 2022
Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Türkiye Mobese Görüntü Takip Türkiye Mobese görüntülerinde OPENCV ve Yolo ile takip sistemi Multiple Object Tracking System in Turkish Mobese with OPE

15 Dec 22, 2022
A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

R-YOLOv4 This is a PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detect

94 Dec 03, 2022
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 655 Jan 04, 2023
Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

CrossTeaching-SSOD 0. Introduction Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection" This repo include

Bruno Ma 9 Nov 29, 2022
Reinforcement learning framework and algorithms implemented in PyTorch.

Reinforcement learning framework and algorithms implemented in PyTorch.

Robotic AI & Learning Lab Berkeley 2.1k Jan 04, 2023
The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

Deep Residual Fourier Transformation for Single Image Deblurring Xintian Mao, Yiming Liu, Wei Shen, Qingli Li and Yan Wang code will be released soon

145 Dec 13, 2022
(Personalized) Page-Rank computation using PyTorch

torch-ppr This package allows calculating page-rank and personalized page-rank via power iteration with PyTorch, which also supports calculation on GP

Max Berrendorf 69 Dec 03, 2022
Pytorch implementation of Compressive Transformers, from Deepmind

Compressive Transformer in Pytorch Pytorch implementation of Compressive Transformers, a variant of Transformer-XL with compressed memory for long-ran

Phil Wang 118 Dec 01, 2022
Lyapunov-guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

PyTorch code to reproduce LyDROO algorithm [1], which is an online computation offloading algorithm to maximize the network data processing capability subject to the long-term data queue stability an

Liang HUANG 87 Dec 28, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
The official implementation of Theme Transformer

Theme Transformer This is the official implementation of Theme Transformer. Checkout our demo and paper : Demo | arXiv Environment: using python versi

Ian Shih 85 Dec 08, 2022
Deep Face Recognition in PyTorch

Face Recognition in PyTorch By Alexey Gruzdev and Vladislav Sovrasov Introduction A repository for different experimental Face Recognition models such

Alexey Gruzdev 141 Sep 11, 2022
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

HAABSAStar Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis". This project builds on the code from https://gith

1 Sep 14, 2020
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles This repository contains a method to generate 3D conformer ensembles direct

127 Dec 20, 2022
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

12 Jun 03, 2022