SeMask: Semantically Masked Transformers for Semantic Segmentation.

Last update: Dec 30, 2022

Overview

SeMask: Semantically Masked Transformers

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.

Results
Setup Instructions
Citing SeMask

1. Results

Note: † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.

ADE20K

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	42.11	43.16	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	45.92	47.63	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	49.35	50.98	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	51.89	53.52	211M	config	TBD
SeMask-L MaskFormer	SeMask Swin-L^†	640x640	54.75	56.15	219M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	640x640	56.41	57.52	222M	config	TBD
SeMask-L Mask2Former FAPN	SeMask Swin-L^†	640x640	56.68	58.00	227M	config	TBD
SeMask-L Mask2Former MSFAPN	SeMask Swin-L^†	640x640	56.54	58.22	224M	config	TBD

Cityscapes

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	768x768	74.92	76.56	34M	config	TBD
SeMask-S FPN	SeMask Swin-S	768x768	77.13	79.14	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	768x768	77.70	79.73	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	768x768	78.53	80.39	211M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	512x1024	83.97	84.98	222M	config	TBD

COCO-Stuff 10k

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	37.53	38.88	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	40.72	42.27	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	44.63	46.30	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	47.47	48.54	211M	config	TBD

2. Setup Instructions

We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:

SeMask-FPN: Setup Instructions
SeMask-MaskFormer: Setup Instructions
SeMask-Mask2Former: Setup Instructions
SeMask-FAPN: Setup Instructions

3. Citing SeMask

@article{jain2022semask,
  title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
  author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
  journal={arXiv preprint arXiv:...},
  year={2022}
}

Acknowledgements

Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.

SeMask: Semantically Masked Transformers for Semantic Segmentation.

Related tags

Overview

SeMask: Semantically Masked Transformers

Contents

1. Results

ADE20K

Cityscapes

COCO-Stuff 10k

2. Setup Instructions

3. Citing SeMask

Acknowledgements

Owner

Picsart AI Research (PAIR)

People Interaction Graph

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Make differentially private training of transformers easy for everyone

🧑‍🔬 verify your TEAL program by experiment and observation

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

A PyTorch Library for Accelerating 3D Deep Learning Research

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

A practical ML pipeline for data labeling with experiment tracking using DVC.

OpenMMLab Detection Toolbox and Benchmark

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

Chinese license plate recognition

Synthetic LiDAR sequential point cloud dataset with point-wise annotations

JAXDL: JAX (Flax) Deep Learning Library

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Learning to Segment Instances in Videos with Spatial Propagation Network

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Fast Differentiable Matrix Sqrt Root