Model summary in PyTorch similar to `model.summary()` in Keras

Last update: Dec 29, 2022

Overview

Keras style `model.summary()` in PyTorch

Keras has a neat API to view the visualization of the model which is very helpful while debugging your network. Here is a barebone code to try and mimic the same in PyTorch. The aim is to provide information complementary to, what is not provided by print(your_model) in PyTorch.

Usage

pip install torchsummary or
git clone https://github.com/sksq96/pytorch-summary

from torchsummary import summary
summary(your_model, input_size=(channels, H, W))

Note that the input_size is required to make a forward pass through the network.

Examples

CNN for MNIST

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # PyTorch v0.4.0
model = Net().to(device)

summary(model, (1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 10, 24, 24]             260
            Conv2d-2             [-1, 20, 8, 8]           5,020
         Dropout2d-3             [-1, 20, 8, 8]               0
            Linear-4                   [-1, 50]          16,050
            Linear-5                   [-1, 10]             510
================================================================
Total params: 21,840
Trainable params: 21,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.08
Estimated Total Size (MB): 0.15
----------------------------------------------------------------

VGG16

import torch
from torchvision import models
from torchsummary import summary

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg = models.vgg16().to(device)

summary(vgg, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------

Multiple Inputs

import torch
import torch.nn as nn
from torchsummary import summary

class SimpleConv(nn.Module):
    def __init__(self):
        super(SimpleConv, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
        )

    def forward(self, x, y):
        x1 = self.features(x)
        x2 = self.features(y)
        return x1, x2
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleConv().to(device)

summary(model, [(1, 16, 16), (1, 28, 28)])

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 1, 16, 16]              10
              ReLU-2            [-1, 1, 16, 16]               0
            Conv2d-3            [-1, 1, 28, 28]              10
              ReLU-4            [-1, 1, 28, 28]               0
================================================================
Total params: 20
Trainable params: 20
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.77
Forward/backward pass size (MB): 0.02
Params size (MB): 0.00
Estimated Total Size (MB): 0.78
----------------------------------------------------------------

References

The idea for this package sparked from this PyTorch issue.
Thanks to @ncullen93 and @HTLife.
For Model Size Estimation @jacobkimmel (details here)

License

pytorch-summary is MIT-licensed.

Model summary in PyTorch similar to `model.summary()` in Keras

Related tags

Overview

Keras style `model.summary()` in PyTorch

Usage

Examples

CNN for MNIST

VGG16

Multiple Inputs

References

License

Owner

Shubham Chandel

A simplified framework and utilities for PyTorch

Use Jax functions in Pytorch with DLPack

A Pytorch Implementation for Compact Bilinear Pooling.

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

Pytorch bindings for Fortran

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Code snippets created for the PyTorch discussion board

Fast, general, and tested differentiable structured prediction in PyTorch

Learning Sparse Neural Networks through L0 regularization

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

A PyTorch implementation of EfficientNet

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

PyTorch implementations of normalizing flow and its variants.

High-fidelity performance metrics for generative models in PyTorch

Reformer, the efficient Transformer, in Pytorch

Pretrained EfficientNet, EfficientNet-Lite, MixNet, MobileNetV3 / V2, MNASNet A1 and B1, FBNet, Single-Path NAS

This is an differentiable pytorch implementation of SIFT patch descriptor.

Model summary in PyTorch similar to `model.summary()` in Keras

Related tags

Overview

Keras style model.summary() in PyTorch

Usage

Examples

CNN for MNIST

VGG16

Multiple Inputs

References

License

Owner

Shubham Chandel

A simplified framework and utilities for PyTorch

Use Jax functions in Pytorch with DLPack

A Pytorch Implementation for Compact Bilinear Pooling.

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

Pytorch bindings for Fortran

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Code snippets created for the PyTorch discussion board

Fast, general, and tested differentiable structured prediction in PyTorch

Learning Sparse Neural Networks through L0 regularization

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

A PyTorch implementation of EfficientNet

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

PyTorch implementations of normalizing flow and its variants.

High-fidelity performance metrics for generative models in PyTorch

Reformer, the efficient Transformer, in Pytorch

Pretrained EfficientNet, EfficientNet-Lite, MixNet, MobileNetV3 / V2, MNASNet A1 and B1, FBNet, Single-Path NAS

This is an differentiable pytorch implementation of SIFT patch descriptor.

Keras style `model.summary()` in PyTorch