Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

Notification Triggers for Python

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Plotting points that lie on the intersection of the given curves using gradient descent.

Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks.

Official PyTorch Implementation of "Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs". NeurIPS 2020.

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

A simple interface for editing natural photos with generative neural networks.

Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

Uni-Fold: Training your own deep protein-folding models.

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

Explainable Zero-Shot Topic Extraction

curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Sequential model-based optimization with a `scipy.optimize` interface

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.