PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

Last update: Dec 07, 2022

Overview

1-bit Wide ResNet

PyTorch implementation of training 1-bit Wide ResNets from this paper:

Training wide residual networks for deployment using a single bit for each weight by Mark D. McDonnell at ICLR 2018

https://openreview.net/forum?id=rytNfI1AZ

https://arxiv.org/abs/1802.08530

The idea is very simple but surprisingly effective for training ResNets with binary weights. Here is the proposed weight parameterization as PyTorch autograd function:

class ForwardSign(torch.autograd.Function):
    @staticmethod
    def forward(ctx, w):
        return math.sqrt(2. / (w.shape[1] * w.shape[2] * w.shape[3])) * w.sign()

    @staticmethod
    def backward(ctx, g):
        return g

On forward, we take sign of the weights and scale it by He-init constant. On backward, we propagate gradient without changes. WRN-20-10 trained with such parameterization is only slightly off from it's full precision variant, here is what I got myself with this code on CIFAR-100:

network	accuracy (5 runs mean +- std)	checkpoint (Mb)
WRN-20-10	80.5 +- 0.24	205 Mb
WRN-20-10-1bit	80.0 +- 0.26	3.5 Mb

Details

Here are the differences with WRN code https://github.com/szagoruyko/wide-residual-networks:

BatchNorm has no affine weight and bias parameters
First layer has 16 * width channels
Last fc layer is removed in favor of 1x1 conv + F.avg_pool2d
Downsample is done by F.avg_pool2d + torch.cat instead of strided conv
SGD with cosine annealing and warm restarts

I used PyTorch 0.4.1 and Python 3.6 to run the code.

Reproduce WRN-20-10 with 1-bit training on CIFAR-100:

python main.py --binarize --save ./logs/WRN-20-10-1bit_$RANDOM --width 10 --dataset CIFAR100

Convergence plot (train error in dash):

I've also put 3.5 Mb checkpoint with binary weights packed with np.packbits, and a very short script to evaluate it:

python evaluate_packed.py --checkpoint wrn20-10-1bit-packed.pth.tar --width 10 --dataset CIFAR100

S3 url to checkpoint: https://s3.amazonaws.com/modelzoo-networks/wrn20-10-1bit-packed.pth.tar

PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

Related tags

Overview

1-bit Wide ResNet

Details

Owner

Sergey Zagoruyko

Multi Task RL Baselines

A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

FTIR-Deep Learning - FTIR Deep Learning With Python

Implementation of average- and worst-case robust flatness measures for adversarial training.

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

A framework for multi-step probabilistic time-series/demand forecasting models

curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

A Quick and Dirty Progressive Neural Network written in TensorFlow.

J.A.R.V.I.S is an AI virtual assistant made in python.

GULAG: GUessing LAnGuages with neural networks

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Efficient Training of Audio Transformers with Patchout

All materials of Cassandra Event, Udyam'22

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

Pytorch implementation of DeePSiM

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user who joins your server.

Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"