Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

Prompt Tuning with Rules

Makes patches from huge resolution .svs slide files using openslide

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

imbalanced-DL: Deep Imbalanced Learning in Python

OpenVINO黑客松比赛项目

A Transformer-Based Siamese Network for Change Detection

Attention-driven Robot Manipulation (ARM) which includes Q-attention

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

pytorch implementation of the ICCV'21 paper "MVTN: Multi-View Transformation Network for 3D Shape Recognition"

Generative Art Using Neural Visual Grammars and Dual Encoders

A working implementation of the Categorical DQN (Distributional RL).

Sentiment analysis translations of the Bhagavad Gita

Code for the paper "Improved Techniques for Training GANs"

Compact Bidirectional Transformer for Image Captioning