Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Last update: Jul 13, 2022

Related tags

Overview

Training Reproduce of PSPNet.

(Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with supporting Sync Batch Norm, see https://github.com/hszhao/semseg.)

(Updated 2019/02/26. A major change of code structure. For the version before, checkout v0.9 https://github.com/holyseven/PSPNet-TF-Reproduce/tree/v0.9.)

This is an implementation of PSPNet (from training to test) in pure Tensorflow library (tested on TF1.12, Python 3).

Supported Backbones: ResNet-V1-50, ResNet-V1-101 and other ResNet-V1s can be easily added.
Supported Databases: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.
Supported Modes: training, validation and inference with multi-scale inputs.
More things: L2-SP regularization and sync batch normalization implementation.

L2-SP Regularization

L2-SP regularization is a variant of L2 regularization. Instead of the origin like L2 does, L2-SP sets the pre-trained model as reference, just like (w - w0)^2, where w0 is the pre-trained model. Simple but effective. More details about L2-SP can be found in the paper and the code.

If you find the L2-SP useful for your research (not limited in image segmentation), please consider citing our work:

@inproceedings{li2018explicit,
  author    = {Li, Xuhong and Grandvalet, Yves and Davoine, Franck},
  title     = {Explicit Inductive Bias for Transfer Learning with Convolutional Networks},
  booktitle={International Conference on Machine Learning (ICML)},
   pages     = {2830--2839},
  year      = {2018}
}

Sync Batch Norm

When concerning image segmentation, batch size is usually limited. Small batch size will make the gradients instable and harm the performance, especially for batch normalization layers. Multi-GPU settings by default does not help because the statistics in batch normalization layer are computed independently within each GPU. More discussion can be found here and here.

This repo resolves this problem in pure python and pure Tensorflow by simply using a list as input. The main idea is located in model/utils_mg.py

I do not know if this is the first implementation of sync batch norm in Tensorflow, but there is already an implementation in PyTorch and some applications.

Update: There is other implementation that uses NCCL to gather statistics across GPUs, see in tensorpack. However, TF1.1 does not support gradients passing by nccl_all_reduce. Plus, ppc64le with tf1.10, cuda9.0 and nccl1.3.5 was not able to run this code. No idea why, and do not want to spend a lot of time on this. Maybe nccl2 can solve this.

Results

Numerical Results

Random scaling for all
Random rotation for SBD
SS/MS on validation set
Welcome to correct and fill in the table

	Backbones	L2	L2-SP
Cityscapes (train set: 3K)	ResNet-50	76.9/?	77.9/?
Cityscapes (train set: 3K)	ResNet-101	77.9/?	78.6/?
Cityscapes (coarse + train set: 20K + 3K)	ResNet-50
Cityscapes (coarse + train set: 20K + 3K)	ResNet-101	80.0/80.9	80.1/81.2*
SBD	ResNet-50	76.5/?	76.6/?
SBD	ResNet-101	77.5/79.2	78.5/79.9
ADE20K	ResNet-50	41.92/43.09
ADE20K	ResNet-101	42.80/?

*This model gets 80.3 without post-processing methods on Cityscapes test set (1525).

Qualitative Results on Cityscapes

Devil Details

Training and Evaluation

Download the databases with the links: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.

Prepare the database for Cityscapes by generating *labelTrainIds.png images with createTrainIdLabelImgs, and then change the code in database/reader.py or move undersired images to other directory.

Download pretrained models.

cd z_pretrained_weights
sh download_resnet_v1_101.sh

A script of training resnet-50 on ADE20K, getting around 41.92 mIoU scores (with single-scale test):

python ./run.py --network 'resnet_v1_50' --visible_gpus '0,1' --reader_method 'queue' --lrn_rate 0.01 --weight_decay_mode 0 --weight_decay_rate 0.0001 --weight_decay_rate2 0.001 --database 'ADE' --subsets_for_training 'train' --batch_size 8 --train_image_size 480 --snapshot 30000 --train_max_iter 90000 --test_image_size 480 --random_rotate 0 --fine_tune_filename './z_pretrained_weights/resnet_v1_50.ckpt'

Test and Infer

Test with multi-scale (set batch_size as large as you can to speed up).

python predict.py --visible_gpus '0' --network 'resnet_v1_101' --database 'ADE' --weights_ckpt './log/ADE/PSP-resnet_v1_101-gpu_num2-batch_size8-lrn_rate0.01-random_scale1-random_rotate1-480-60000-train-1-0.0001-0.001-0-0-1-1/snapshot/model.ckpt-60000' --test_subset 'val' --test_image_size 480 --batch_size 8 --ms 1 --mirror 1

Infer one image (with multi-scale).

python demo_infer.py --database 'Cityscapes' --network 'resnet_v1_101' --weights_ckpt './log/Cityscapes/old/model.ckpt-50000' --test_image_size 864 --batch_size 4 --ms 1

Uncertainties for Training Details:

(Cityscapes only) Whether finely labeled data in the first training stage should be involved?
(Cityscapes only) Whether the (base) learning rate should be reduced in the second training stage?
Whether logits should be resized to original size before computing the loss?
Whether new layers should receive larger learning rate?
About weired padding behavior of tf.image.resize_images(). Whether the align_corners=True should be set?
What is optimal hyperparameter of decay for statistics of batch normalization layers? (0.9, 0.95, 0.9997)
may be more but not sure how much these little changes can effect the results ...
Welcome to discuss !

Change Log

26 Febuary, 2019

Code structure: on-the-fly evaluation during training.
Code structure: wrapping of the model.
Add tf.data support, but with queue-based reader is faster.
print results using python utils.py in experiment_manager dir.
The default environment is Python 3 and TF1.12. OpenCV is needed for predicting and demo_infer.
The previous version becomes a branch of this repo named as v0.9.

External links

Pyramid Scene Parsing Network paper and official github.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Related tags

Overview

Training Reproduce of PSPNet.

L2-SP Regularization

Sync Batch Norm

Results

Numerical Results

Qualitative Results on Cityscapes

Devil Details

Training and Evaluation

Test and Infer

Uncertainties for Training Details:

Change Log

26 Febuary, 2019

External links

Owner

Li Xuhong

Implementation of ProteinBERT in Pytorch

A general-purpose encoder-decoder framework for Tensorflow

Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

PushForKiCad - AISLER Push for KiCad EDA

Transfer Learning Shootout for PyTorch's model zoo (torchvision)

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

Differentiable architecture search for convolutional and recurrent networks

Python版OpenCVのTracking APIのサンプルです。DaSiamRPNアルゴリズムまで対応しています。

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Tooling for the Common Objects In 3D dataset.

Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

Deep metric learning methods implemented in Chainer

Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

Air Quality Prediction Using LSTM

D-NeRF: Neural Radiance Fields for Dynamic Scenes

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation