A PyTorch implementation of DenseNet.

Overview

A PyTorch Implementation of DenseNet

This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Convolutional Networks by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. This implementation gets a CIFAR-10+ error rate of 4.77 with a 100-layer DenseNet-BC with a growth rate of 12. Their official implementation and links to many other third-party implementations are available in the liuzhuang13/DenseNet repo on GitHub.

Why DenseNet?

As this table from the DenseNet paper shows, it provides competitive state of the art results on CIFAR-10, CIFAR-100, and SVHN.

Why yet another DenseNet implementation?

PyTorch is a great new framework and it's nice to have these kinds of re-implementations around so that they can be integrated with other PyTorch projects.

How do you know this implementation is correct?

Interestingly while implementing this, I had a lot of trouble getting it to converge and looked at every part of the code closer than I usually would. I compared all of the model's hidden states and gradients with the official implementation to make sure my code was correct and even trained a VGG-style network on CIFAR-10 with the training code here. It turns out that I uncovered a new critical PyTorch bug (now fixed) that was causing this.

I have left around my original message about how this isn't working and the things that I have checked in this document. I think this should be interesting for other people to see my development and debugging strategies when having issues implementing a model that's known to converge. I also started this PyTorch forum thread, which has a few other discussion points. You may also be interested in my script that compares PyTorch gradients to Torch gradients and my script that numerically checks PyTorch gradients.

My convergence issues were due to a critical PyTorch bug related to using torch.cat with convolutions with cuDNN enabled (which it is by default when CUDA is used). This bug caused incorrect gradients and the fix to this bug is to disable cuDNN (which doesn't have to be done anymore because it's fixed). The oversight in my debugging strategies that caused me to not find this error is that I did not think to disable cuDNN. Until now, I have assumed that the cuDNN option in frameworks are bug-free, but have learned that this is not always the case. I may have also found something if I would have numerically debugged torch.cat layers with convolutions instead of fully connected layers.

Adam fixed the PyTorch bug that caused this in this PR and has been merged into Torch's master branch. If you are interested in using the DenseNet code in this repository, make sure your PyTorch version contains this PR and was downloaded after 2017-02-10.

What does the PyTorch compute graph of the model look like?

You can see the compute graph here, which I created with make_graph.py, which I copied from Adam Paszke's gist. Adam says PyTorch will soon have a better way to create compute graphs.

How does this implementation perform?

By default, this repo trains a 100-layer DenseNet-BC with an growth rate of 12 on the CIFAR-10 dataset with data augmentations. Due to GPU memory sizes, this is the largest model I am able to run. The paper reports a final test error of 4.51 with this architecture and we obtain a final test error of 4.77.

Why don't people use ADAM instead of SGD for training ResNet-style models?

I also tried training a net with ADAM and found that it didn't converge as well with the default hyper-parameters compared to SGD with a reasonable learning rate schedule.

What about the non-BC version?

I haven't tested this as thoroughly, you should make sure it's working as expected if you plan to use and modify it. Let me know if you find anything wrong with it.

A paradigm for ML code

I like to include a few features in my projects that I don't see in some other re-implementations that are present in this repo. The training code in train.py uses argparse so the batch size and some other hyper-params can easily be changed and as the model is training, progress is written out to csv files in a work directory also defined by the arguments. Then a separate script plot.py plots the progress written out by the training script. The training script calls plot.py after every epoch, but it can importantly be run on its own so figures can be tweaked without re-running the entire experiment.

Help wanted: Improving memory utilization and multi-GPU support

I think there are ways to improve the memory utilization in this code as in the the official space-efficient Torch implementation. I also would be interested in multi-GPU support.

Running the code and viewing convergence

First install PyTorch (ideally in an anaconda3 distribution). ./train.py will create a model, start training it, and save progress to args.save, which is work/cifar10.base by default. The training script will call plot.py after every epoch to create plots from the saved progress.

Citations

The following is a BibTeX entry for the DenseNet paper that you should cite if you use this model.

@article{Huang2016Densely,
  author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.},
  title = {Densely Connected Convolutional Networks},
  journal = {arXiv preprint arXiv:1608.06993},
  year = {2016}
}

If you use this implementation, please also consider citing this implementation and code repository with the following BibTeX or plaintext entry. The BibTeX entry requires the url LaTeX package.

@misc{amos2017densenet,
  title = {{A PyTorch Implementation of DenseNet}},
  author = {Amos, Brandon and Kolter, J. Zico},
  howpublished = {\url{https://github.com/bamos/densenet.pytorch}},
  note = {Accessed: [Insert date here]}
}

Brandon Amos, J. Zico Kolter
A PyTorch Implementation of DenseNet
https://github.com/bamos/densenet.pytorch.
Accessed: [Insert date here]

Licensing

This repository is Apache-licensed.

Owner
Brandon Amos
Brandon Amos
REBEL: Relation Extraction By End-to-end Language generation

REBEL: Relation Extraction By End-to-end Language generation This is the repository for the Findings of EMNLP 2021 paper REBEL: Relation Extraction By

Babelscape 222 Jan 06, 2023
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”.

3.7k Dec 31, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Code for "Modeling Indirect Illumination for Inverse Rendering", CVPR 2022

Modeling Indirect Illumination for Inverse Rendering Project Page | Paper | Data Preparation Set up the python environment conda create -n invrender p

ZJU3DV 116 Jan 03, 2023
Amazing-Python-Scripts - 🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

📑 Introduction A curated collection of Amazing Python scripts from Basics to Advance with automation task scripts. This is your Personal space to fin

Avinash Ranjan 1.1k Dec 29, 2022
Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

Anastasia Kim 28 Feb 20, 2022
Learn about quantum computing and algorithm on quantum computing

quantum_computing this repo contains everything i learn about quantum computing and algorithm on quantum computing what is aquantum computing quantum

arfy slowy 8 Dec 25, 2022
Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

👁️ Hindsight AI: Crime Classification With Clip About For Educational Purposes Only This is a recursive neural net trained to classify specific crime

Miles Tweed 2 Jun 05, 2022
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

O-CNN This repository contains the implementation of our papers related with O-CNN. The code is released under the MIT license. O-CNN: Octree-based Co

Microsoft 607 Dec 28, 2022
Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu, Qiang

onion 462 Dec 29, 2022
Posterior predictive distributions quantify uncertainties ignored by point estimates.

Posterior predictive distributions quantify uncertainties ignored by point estimates.

DeepMind 177 Dec 06, 2022
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 02, 2022
A PyTorch port of the Neural 3D Mesh Renderer

Neural 3D Mesh Renderer (CVPR 2018) This repo contains a PyTorch implementation of the paper Neural 3D Mesh Renderer by Hiroharu Kato, Yoshitaka Ushik

Daniilidis Group University of Pennsylvania 1k Jan 09, 2023
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
Code repository for Self-supervised Structure-sensitive Learning, CVPR'17

Self-supervised Structure-sensitive Learning (SSL) Ke Gong, Xiaodan Liang, Xiaohui Shen, Liang Lin, "Look into Person: Self-supervised Structure-sensi

Clay Gong 219 Dec 29, 2022
The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

CSGStumpNet The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing Paper | Project page

Daxuan 39 Dec 26, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022