Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Overview

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!

You can also use this colab notebook playground here to tinker with the code for segmenting an image.

You can reach the dataset here.

All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch

[From left to right: Test Image, Ground Truth, Predicted Result]

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Updates

  • HRNet model is now supported.
  • We use configuration files to store most options which were in argument parser. The definitions of options are detailed in config/defaults.py.
  • We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).

Highlights

Syncronized Batch Normalization on PyTorch

This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.

The implementation is easy to use as:

  • It is pure-python, no C++ extra extension libs.
  • It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
  • It is efficient, only 20% to 30% slower than UnsyncBN.

Dynamic scales of input for training with multiple GPUs

For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.

Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader.

State-of-the-Art models

  • PSPNet is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to https://arxiv.org/abs/1612.01105 for details.
  • UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. Without bells and whistles, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to https://arxiv.org/abs/1807.10221 for details.
  • HRNet is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to https://arxiv.org/abs/1904.04514 for details.

Supported models

We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config folder.

Encoder:

  • MobileNetV2dilated
  • ResNet18/ResNet18dilated
  • ResNet50/ResNet50dilated
  • ResNet101/ResNet101dilated
  • HRNetV2 (W48)

Decoder:

  • C1 (one convolution module)
  • C1_deepsup (C1 + deep supervision trick)
  • PPM (Pyramid Pooling Module, see PSPNet paper for details.)
  • PPM_deepsup (PPM + deep supervision trick)
  • UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture MultiScale Testing Mean IoU Pixel Accuracy(%) Overall Score Inference Speed(fps)
MobileNetV2dilated + C1_deepsup No 34.84 75.75 54.07 17.2
Yes 33.84 76.80 55.32 10.3
MobileNetV2dilated + PPM_deepsup No 35.76 77.77 56.27 14.9
Yes 36.28 78.26 57.27 6.7
ResNet18dilated + C1_deepsup No 33.82 76.05 54.94 13.9
Yes 35.34 77.41 56.38 5.8
ResNet18dilated + PPM_deepsup No 38.00 78.64 58.32 11.7
Yes 38.81 79.29 59.05 4.2
ResNet50dilated + PPM_deepsup No 41.26 79.73 60.50 8.3
Yes 42.14 80.13 61.14 2.6
ResNet101dilated + PPM_deepsup No 42.19 80.59 61.39 6.8
Yes 42.53 80.91 61.72 2.0
UperNet50 No 40.44 79.80 60.12 8.4
Yes 41.55 80.23 60.89 2.9
UperNet101 No 42.00 80.79 61.40 7.8
Yes 42.66 81.01 61.84 2.3
HRNetV2 No 42.03 80.77 61.40 5.8
Yes 43.20 81.47 62.34 1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Environment

The code is developed under the following configurations.

  • Hardware: >=4 GPUs for training, >=1 GPU for testing (set [--gpus GPUS] accordingly)
  • Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0
  • Dependencies: numpy, scipy, opencv, yacs, tqdm

Quick start: Test on an image using our trained model

  1. Here is a simple demo to do inference on a single image:
chmod +x demo_test.sh
./demo_test.sh

This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.

  1. To test on an image or a folder of images ($PATH_IMG), you can simply do the following:
python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG

Training

  1. Download the ADE20K scene parsing dataset:
chmod +x download_ADE20K.sh
./download_ADE20K.sh
  1. Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, checkpoints by default are saved in folder ckpt.
python3 train.py --gpus $GPUS --cfg $CFG 
  • To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.

For example, you can start with our provided configurations:

  • Train MobileNetV2dilated + C1_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Train ResNet50dilated + PPM_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Train UPerNet101
python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
  1. You can also override options in commandline, for example python3 train.py TRAIN.num_epoch 10 .

Evaluation

  1. Evaluate a trained model on the validation set. Add VAL.visualize True in argument to output visualizations as shown in teaser.

For example:

  • Evaluate MobileNetV2dilated + C1_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Evaluate ResNet50dilated + PPM_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Evaluate UPerNet101
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

Integration with other projects

This library can be installed via pip to easily integrate with another codebase

pip install git+https://github.com/CSAILVision/[email protected]

Now this library can easily be consumed programmatically. For example

from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}
Owner
Berat Eren Terzioğlu
AI & Computer Vision Engineer
Berat Eren Terzioğlu
N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting Recent progress in neural forecasting instigated significant improvements in the

Cristian Challu 82 Jan 04, 2023
A high-performance distributed deep learning system targeting large-scale and automated distributed training.

HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop

DAIR Lab 150 Dec 21, 2022
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

Leaded Gradient Method (LGM) This repository contains the PyTorch implementation for paper Dynamics-aware Adversarial Attack of 3D Sparse Convolution

An Tao 2 Oct 18, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022
Projecting interval uncertainty through the discrete Fourier transform

Projecting interval uncertainty through the discrete Fourier transform This repo

1 Mar 02, 2022
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
2021 National Underwater Robotics Vision Optics

2021-National-Underwater-Robotics-Vision-Optics 2021年全国水下机器人算法大赛-光学赛道-B榜精度第18名 (Kilian_Di的团队:A榜[email pro

Di Chang 9 Nov 04, 2022
An Open-Source Package for Information Retrieval.

OpenMatch An Open-Source Package for Information Retrieval. 😃 What's New Top Spot on TREC-COVID Challenge (May 2020, Round2) The twin goals of the ch

THUNLP 439 Dec 27, 2022
Official PyTorch implementation of "Preemptive Image Robustification for Protecting Users against Man-in-the-Middle Adversarial Attacks" (AAAI 2022)

Preemptive Image Robustification for Protecting Users against Man-in-the-Middle Adversarial Attacks This is the code for reproducing the results of th

2 Dec 27, 2021
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Adversrial Machine Learning Benchmarks This code belongs to the papers: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness? Det

Adversarial Machine Learning 9 Nov 27, 2022
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Python implementation of a live deep learning based age/gender/expression recognizer

TUT live age estimator Python implementation of a live deep learning based age/gender/smile/celebrity twin recognizer. All components use convolutiona

Heikki Huttunen 80 Nov 21, 2022
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

19.4k Jan 04, 2023
BRepNet: A topological message passing system for solid models

BRepNet: A topological message passing system for solid models This repository contains the an implementation of BRepNet: A topological message passin

Autodesk AI Lab 42 Dec 30, 2022
TUPÃ was developed to analyze electric field properties in molecular simulations

TUPÃ: Electric field analyses for molecular simulations What is TUPÃ? TUPÃ (pronounced as tu-pan) is a python algorithm that employs MDAnalysis engine

Marcelo D. Polêto 10 Jul 17, 2022
COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping Version 1.0 COVINS is an accurate, scalable, and versatile vis

ETHZ V4RL 183 Dec 27, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
The repo of Feedback Networks, CVPR17

Feedback Networks http://feedbacknet.stanford.edu/ Paper: Feedback Networks, CVPR 2017. Amir R. Zamir*,Te-Lin Wu*, Lin Sun, William B. Shen, Bertram E

Stanford Vision and Learning Lab 87 Nov 19, 2022