CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Last update: Dec 26, 2022

Overview

Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View,
Tianfei Zhou, Wenguan Wang, Ender Konukoglu and Luc Van Gool
CVPR 2022 (Oral) (arXiv 2203.15102)

News

[2022-04-19] Release the code based on openseg.pytorch!
[2022-03-31] Paper link updated!
[2022-03-12] Repo created. Paper and code will come soon.

Abstract

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters.We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

Installation

This implementation is built on openseg.pytorch. Many thanks to the authors for the efforts.

Please follow the Getting Started for installation and dataset preparation.

Performance

Cityscapes

Method	Train Set	Val Set	Iters	Batch Size	mIoU	Log	CKPT	Script
HRNet	train	val	80K	8	79.0	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4.sh`
Ours	train	val	80K	8	80.1	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh`

More results will come soon

Citation

@inproceedings{zhou2022rethinking,
    author    = {Zhou, Tianfei and Wang, Wenguan and Konukoglu, Ender and Van Gool, Luc},
    title     = {Rethinking Semantic Segmentation: A Prototype View},
    booktitle = {CVPR},
    year      = {2022}
}

Relevant Projects

Please also see our works [1] for a novel training paradigm with a cross-image, pixel-to-pixel contrative loss, and [2] for a novel hierarchy-aware segmentation learning scheme for structured scene parsing.

[1] Exploring Cross-Image Pixel Contrast for Semantic Segmentation - ICCV 2021 (Oral) [arXiv][code]

[2] Deep Hierarchical Semantic Segmentation - CVPR 2022 [arXiv][code]

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Related tags

Overview

Rethinking Semantic Segmentation: A Prototype View

News

Abstract

Installation

Performance

Cityscapes

Citation

Relevant Projects

Owner

Tianfei Zhou

A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Orthogonal Over-Parameterized Training

Relative Positional Encoding for Transformers with Linear Complexity

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

🥈78th place in Riiid Solution🥈

Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

[CVPR'22] COAP: Learning Compositional Occupancy of People

A blender add-on that automatically re-aligns wrong axis objects.

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

Scripts used to make and evaluate OpenAlex's concept tagging model

Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Autonomous Robots Kalman Filters

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

RLHive: a framework designed to facilitate research in reinforcement learning.

Robotics with GPU computing

Crosslingual Segmental Language Model