PyTorch implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC

Overview

DeepLab with PyTorch

This is an unofficial PyTorch implementation of DeepLab v2 [1] with a ResNet-101 backbone.

  • COCO-Stuff dataset [2] and PASCAL VOC dataset [3] are supported.
  • The official Caffe weights provided by the authors can be used without building the Caffe APIs.
  • DeepLab v3/v3+ models with the identical backbone are also included (not tested).
  • torch.hub is supported.

Performance

COCO-Stuff

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
10k train 10k val Official [2] 65.1 45.5 34.4 50.4
This repo Download 65.8 45.7 34.8 51.2
67.1 46.4 35.6 52.5
164k train 164k val This repo Download 66.8 51.2 39.1 51.5
67.6 51.5 39.7 52.3

† Images and labels are pre-warped to square-shape 513x513
‡ Note for SPADE followers: The provided COCO-Stuff 164k weight has been kept intact since 2019/02/23.

PASCAL VOC 2012

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
trainaug val Official [3] - - 76.35 -
- - 77.69 -
This repo Download 94.64 86.50 76.65 90.41
95.04 86.64 77.93 91.06

Setup

Requirements

Required Python packages are listed in the Anaconda configuration file configs/conda_env.yaml. Please modify the listed cudatoolkit=10.2 and python=3.6 as needed and run the following commands.

# Set up with Anaconda
conda env create -f configs/conda_env.yaml
conda activate deeplab-pytorch

Download datasets

Download pre-trained caffemodels

Caffemodels pre-trained on COCO and PASCAL VOC datasets are released by the DeepLab authors. In accordance with the papers [1,2], this repository uses the COCO-trained parameters as initial weights.

  1. Run the follwing script to download the pre-trained caffemodels (1GB+).
$ bash scripts/setup_caffemodels.sh
  1. Convert the caffemodels to pytorch compatibles. No need to build the Caffe API!
# Generate "deeplabv1_resnet101-coco.pth" from "init.caffemodel"
$ python convert.py --dataset coco
# Generate "deeplabv2_resnet101_msc-vocaug.pth" from "train2_iter_20000.caffemodel"
$ python convert.py --dataset voc12

Training & Evaluation

To train DeepLab v2 on PASCAL VOC 2012:

python main.py train \
    --config-path configs/voc12.yaml

To evaluate the performance on a validation set:

python main.py test \
    --config-path configs/voc12.yaml \
    --model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pth

Note: This command saves the predicted logit maps (.npy) and the scores (.json).

To re-evaluate with a CRF post-processing:

python main.py crf \
    --config-path configs/voc12.yaml

Execution of a series of the above scripts is equivalent to bash scripts/train_eval.sh.

To monitor a loss, run the following command in a separate terminal.

tensorboard --logdir data/logs

Please specify the appropriate configuration files for the other datasets.

Dataset Config file #Iterations Classes
PASCAL VOC 2012 configs/voc12.yaml 20,000 20 foreground + 1 background
COCO-Stuff 10k configs/cocostuff10k.yaml 20,000 182 thing/stuff
COCO-Stuff 164k configs/cocostuff164k.yaml 100,000 182 thing/stuff

Note: Although the label indices range from 0 to 181 in COCO-Stuff 10k/164k, only 171 classes are supervised.

Common settings:

  • Model: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8.
  • GPU: All the GPUs visible to the process are used. Please specify the scope with CUDA_VISIBLE_DEVICES=.
  • Multi-scale loss: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The unlabeled class is ignored in the loss computation.
  • Gradient accumulation: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (batch_size * iter_size = 10). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small batch_size.
  • Learning rate: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by (1-iter/iter_max)**power at every 10 iterations.
  • Monitoring: Moving average loss (average_loss in Caffe) can be monitored in TensorBoard.
  • Preprocessing: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321.

Processed images and labels in COCO-Stuff 164k:

Data

Inference Demo

You can use the pre-trained models, the converted models, or your models.

To process a single image:

python demo.py single \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth \
    --image-path image.jpg

To run on a webcam:

python demo.py live \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth

To run a CRF post-processing, add --crf. To run on a CPU, add --cpu.

Misc

torch.hub

Model setup with two lines

import torch.hub
model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182)

Difference with Caffe version

  • While the official code employs 1/16 bilinear interpolation (Interp layer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (PIL.Image.resize, related issue).
  • Bilinear interpolation on images and logits is performed with the align_corners=False.

Training batch normalization

This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please install the extra library below.

pip install torch-encoding

Batch normalization layers in a model are automatically switched in libs/models/resnet.py.

try:
    from encoding.nn import SyncBatchNorm
    _BATCH_NORM = SyncBatchNorm
except:
    _BATCH_NORM = nn.BatchNorm2d

References

  1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI, 2018.
    Project / Code / arXiv paper

  2. H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR, 2018.
    Project / arXiv paper

  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.
    Project / Paper

Comments
  • ResBlock's stride

    ResBlock's stride

    I wonder why you set stride=2 when implement the 'layer3' as: self.add_module("layer3", _ResBlock(n_blocks[1], 256, 128, 512, 2, 1)) Is there any reason to do that?

    opened by JoyHuYY1412 13
  • Missing keys in state_dict

    Missing keys in state_dict

    I try to run cocostuff pretrained model on a voc12 image using demo.py. After converting caffemodel of coco into pytorch model, there arises an error when runing demo.py in line 57 , which is model.load_state_dict(state_dict) .

    RuntimeError: Error in loading state_dict for MSC: Missing key(s) in state_dict : "scale.aspp.stages.c0.bias" , "scale.aspp.stages.c0.weight" ,"scale.aspp.stages.c1.bias" , "scale.aspp.stages.c1.weight" , "scale.aspp.stages.c2.bias" , "scale.aspp.stages.c2.weight" , "scale.aspp.stages.c3.bias" , "scale.aspp.stages.c3.weight".

    and when I look for display information when converting coco_init caffemodel into .pth file , indeed I don't see any related information too. It seems there is no scale.aspp related layers' parameters. I don't know why and how to solve this issue. Thanks!

    opened by chenyanghungry 10
  • SegmentationAug file link problem

    SegmentationAug file link problem

    opened by bigpicturejh 8
  • Crashing on test

    Crashing on test

    Hi Kazuto,

    For some reason when I launch the test on coco stuff 10k, the script crashes at about 38%. This is the error message I got:

    python main.py test --config config/cocostuff10k.yaml --model-path data/models/deeplab_resnet101/cocostuff10k/checkpoint_final.pth
    Mode: test
    Device: TITAN X (Pascal)
    /hardmnt/kraken0/home/poiesi/data/research/deeplearning/deeplab-pytorch/libs/utils/metric.py:21: RuntimeWarning: invalid value encountered in true_divide
      acc_cls = np.diag(hist) / hist.sum(axis=1)
    /hardmnt/kraken0/home/poiesi/data/research/deeplearning/deeplab-pytorch/libs/utils/metric.py:23: RuntimeWarning: invalid value encountered in true_divide
      iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))
    

    Do you know what it might be due to?

    opened by fabiopoiesi 6
  • cuda error caused by negative tensor value

    cuda error caused by negative tensor value

    Hi, thanks for your nice code again! But I got a wired error when run your code, error info as below:

    THCudaCheck FAIL file=/opt/conda/conTHC/generic/THCTensorCopy.c line=20 error=59 : device-side assert triggered Traceback (most recent call last): File "train.py", line 229, in <module> main() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return self.main(*args, **kwargs) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line rv = self.invoke(ctx) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return ctx.invoke(self.callback, **ctx.params) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return callback(*args, **kwargs) File "train.py", line 183, in main target_ = target_.to(device) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_TensorCopy.c:20 Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoad Traceback (most recent call last): File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/utils/data/data self._shutdown_workers() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/utils/data/data self.worker_result_queue.get() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/queues.py", line 33 return ForkingPickler.loads(res) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/multiprocessinge_fd fd = df.detach() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/resource_sharer.py" with _resource_sharer.get_connection(self._id) as conn: File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/resource_sharer.py" c = Client(address, authkey=process.current_process().authkey) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/connection.py", lin c = SocketClient(address) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/connection.py", lin s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused

    it may be caused by negative tensor value when set ignore_label to -1 in preprocessing label map according to this issue torch/cutorch#708, after I set the ignore label to 255 (I make minor change to your codes to run it on voc12), it can work fine

    opened by zhijiew 6
  • poorer perfermance on VOC dataset

    poorer perfermance on VOC dataset

    When I trained DeepLabv2 on PASCAL VOC dataset, I followed every step as you recommended and used the default settings. However, after 20000 iters, the mIOU on the validation set is only ~70%, and ~71% after CRF, which is much lower than you put on the git.

    I check out all the steps and can't find any change that I possibly make that might influence the perfermance. Would you tell me how to achieve the ~77% mIOU on VOC. Maybe I miss out some key training strategies.

    opened by veizgyauzgyauz 5
  • ValueError: Expected input batch_size (182) to match target batch_size (2).

    ValueError: Expected input batch_size (182) to match target batch_size (2).

    When I run the deeplabv2 model on cocostuff164k, with batch_size=2,I get the error

    #ValueError: Expected input batch_size (182) to match target batch_size (2).

    And I check the output's of model is [2,182,41,41],and the labels'shape is [2,41,41],I think the problem is loss function.And I do not how to fix this issue.Could you help me ?Thanks

    opened by yejg2017 5
  • VOC12

    VOC12

    Hi Kazuto,

    Are you planning to update your repo adding the full support for VOC12 dataset? Some parts have the possibility to configure it, some others not, i.e. get_dataset().

    Thanks

    opened by fabiopoiesi 5
  • Train with COCO-Stuff 164K

    Train with COCO-Stuff 164K

    Hi. With your permission I'd love to recommend this repo on the COCO-Stuff page. Do you think you could train it on the train set of COCO-Stuff 164K and provide that model? That should also significantly boost the performance. Let me know if you have further questions.

    opened by nightrome 5
  • The training result is poor after using the weak supervision data set

    The training result is poor after using the weak supervision data set

    Hello, if I want to use weakly supervised data to replace the ground truth value to train the segmented network in the experiment, do I need to modify the code after replacing the dataset? Because I tried to run this code with the pseudo mask image (10582) obtained by weak supervision, but the result was only 0.03, which actually only recognized the background. Do I need to modify the code? image

    opened by woqiaow 4
  • only 20% mIou

    only 20% mIou

    Thanks for a great job! I am a student who is following your job. All of my operations are based on README.md, however , I got very strange mIou, about 20%. I download the trained model from Github and the test result is normal, about 76.5%. The results mean that the testing process was fine. I also changed the pre-trained model to resnet101 which is trained on ImageNet. The result always about 20%. Sadly, I couldn't find any problem during training. My PyTorch version is 1.7. I want to know if the PyTorch version will affect my results. when I training, I just changed the IMAGE.SIZE.TRAIN=289,257 .... The following result is that IMAGE.SIZE.TRAIN=289, GPU 1 3060, pre-trained model = deeplabv2_resnet101_msc-vocaug-20000.pth.

    { "Class IoU": { "0": 0.7948632887014013, "1": 0.3304257442805647, "2": 0.11077134876825119, "3": 0.07046620852376487, "4": 0.10895940525580272, "5": 0.04745116392110705, "6": 0.31005641398703143, "7": 0.2915453936066487, "8": 0.23258285017371, "9": 0.005347773696150551, "10": 0.10445717208326169, "11": 0.10978767179703398, "12": 0.18345830368166063, "13": 0.12426651067058993, "14": 0.2462254036792113, "15": 0.415199601472948, "16": 0.050153233366977974, "17": 0.15261392666663348, "18": 0.02888480390809123, "19": 0.27664375976635347, "20": 0.13747916669502674 }, "Frequency Weighted IoU": 0.6421553231368009, "Mean Accuracy": 0.2748884341598109, "Mean IoU": 0.1967447211762962, "Pixel Accuracy": 0.7687984741459604 }

    If you could give me some advice, I will appreciate you very much! thank you! Best wishes to you!

    opened by xinyuaning 4
  • A simple question about test on VOC

    A simple question about test on VOC

    Hi, I trained the model with train_aug.txt and evaluated it with val.txt. When I want to test it with test.txt, the images in test.txt do not exist in JPEGImages fold. How can I solve this problem?

    opened by yangxinhaosmu 0
  • Numerical Instability in metrics.py

    Numerical Instability in metrics.py

    When I use metrics.py to evaluate a model using the same weight, I get different mIoU values for different runs.

    I am using your DeepLab implementation as a backbone in another network and also using your evaluation code Below are 3 such runs, where metrics.py has been used to evaluate the model on the same validation set, using the same weights.

    RUN 1

    > 'Pixel Accuracy': 0.891,   
    > 'Mean Accuracy': 0.755,  
    > 'Frequency Weighted IoU': 0.810,  
    > 'Mean IoU': 0.615, 
    
    

    RUN 2

    
    > 'Pixel Accuracy': 0.896, 
    > 'Mean Accuracy': 0.761,
    >  'Frequency Weighted IoU': 0.819, 
    > 'Mean IoU': 0.622, 
    
    

    RUN 3

    
    >    "Pixel Accuracy": 0.882
    >    "Mean Accuracy": 0.748,
    >    "Frequency Weighted IoU": 0.798,
    >    "Mean IoU": 0.609,
    
    
    

    seems like its an issue of numerical instability. Particularly, I feel that either the _fast_hist function or the division in scores function in utils/metric.py file is the root cause.

    Will greatly appreciate if you can provide some help here thank you!

    opened by DebasmitaGhose 1
  • Did  someone  tested  the  DeeplabV3 and plus?

    Did someone tested the DeeplabV3 and plus?

    Thanks to the author's clearly code style , I learned this Deeplab series very quickly from nothing , now I want to use this repository as my baseline to work , but I don't have much time to test DeeplabV3 , did someone tested it before ?

    opened by heartInsert 2
Owner
Kazuto Nakashima
Kazuto Nakashima
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Mohamed Elmorsy 2 Jul 06, 2022
Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Learning Tracking Representations via Dual-Branch Fully Transformer Networks DualTFR ⭐ We achieves the runner-ups for both VOT2021ST (short-term) and

phiphi 19 May 04, 2022
[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

IICNet - Invertible Image Conversion Net Official PyTorch Implementation for IICNet: A Generic Framework for Reversible Image Conversion (ICCV2021). D

felixcheng97 55 Dec 06, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 03, 2023
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised de

Hang 94 Dec 25, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
Hierarchical Time Series Forecasting with a familiar API

scikit-hts Hierarchical Time Series with a familiar API. This is the result from not having found any good implementations of HTS on-line, and my work

Carlo Mazzaferro 204 Dec 17, 2022
The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

Kaiyu Shi 287 Nov 25, 2022
The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

DAGAN This is the official implementation code for DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruct

TensorLayer Community 159 Nov 22, 2022
Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

tf-SNDCGAN Tensorflow implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (https://www.researchgate.net/publicati

Nhat M. Nguyen 248 Nov 25, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
MonoScene: Monocular 3D Semantic Scene Completion

MonoScene: Monocular 3D Semantic Scene Completion MonoScene: Monocular 3D Semantic Scene Completion] [arXiv + supp] | [Project page] Anh-Quan Cao, Rao

298 Jan 08, 2023
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
Notspot robot simulation - Python version

Notspot robot simulation - Python version This repository contains all the files and code needed to simulate the notspot quadrupedal robot using Gazeb

50 Sep 26, 2022
CondNet: Conditional Classifier for Scene Segmentation

CondNet: Conditional Classifier for Scene Segmentation Introduction The fully convolutional network (FCN) has achieved tremendous success in dense vis

ycszen 31 Jul 22, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Holy Wu 35 Jan 01, 2023
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 03, 2023
Kernel Point Convolutions

Created by Hugues THOMAS Introduction Update 27/04/2020: New PyTorch implementation available. With SemanticKitti, and Windows supported. This reposit

Hugues THOMAS 584 Jan 07, 2023