Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Last update: Dec 26, 2022

Overview

Bidirectional Projection Network for Cross Dimension Scene Understanding

CVPR 2021 (Oral)

Existing segmentation methods are mostly unidirectional, i.e. utilizing 3D for 2D segmentation or vice versa. Obviously 2D and 3D information can nicely complement each other in both directions, during the segmentation. This is the goal of bidirectional projection network.

Environment

Main

# Torch
$ pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
# MinkowskiEngine 0.4.1
$ conda install numpy openblas
$ git clone https://github.com/StanfordVL/MinkowskiEngine.git
$ cd MinkowskiEngine
$ git checkout f1a419cc5792562a06df9e1da686b7ce8f3bb5ad
$ python setup.py install
# Others
$ pip install imageio==2.8.0 opencv-python==4.2.0.32 pillow==7.0.0 pyyaml==5.3 scipy==1.4.1 sharedarray==3.2.0 tensorboardx==2.0 tqdm==4.42.1

Others

Please refer to env.yml for details.

Prepare data

Download the dataset from official website.
2D: The scripts is from 3DMV repo, it is based on python2, other code in this repo is based on python3 python prepare_2d_data.py --scannet_path data/scannetv2 --output_path data/scannetv2_images --export_label_images
3D: dataset/preprocess_3d_scannet.py

Config

BPNet_5cm: config/scannet/bpnet_5cm.yaml

Training

Download pretrained 2D ResNets on ImageNet from PyTorch website, and put them into the initmodel folder.

model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}

Start training: sh tool/train.sh EXP_NAME /PATH/TO/CONFIG NUMBER_OF_THREADS
Resume: sh tool/resume.sh EXP_NAME /PATH/TO/CONFIG(copied one) NUMBER_OF_THREADS

NUMBER_OF_THREADS is the threads to use per process (gpu), so optimally, it should be Total_threads / gpu_number_used

Testing

Testing using your trained model or our pre-trained model (voxel_size: 5cm): sh tool/test.sh EXP_NAME /PATH/TO/CONFIG(copied one) NUMBER_OF_THREADS)

Copyright and License

You are granted with the LICENSE for both academic and commercial usages.

Acknowledgment

Our code is based on MinkowskiEngine. We also referred to SparseConvNet and semseg.

Citation

@inproceedings{hu-2021-bidirectional,
        author      = {Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia and Tien-Tsin Wong},
        title       = {Bidirectional Projection Network for Cross Dimensional Scene Understanding},
        booktitle   = {CVPR},
        year        = {2021}
    }

Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Related tags

Overview

Bidirectional Projection Network for Cross Dimension Scene Understanding

Environment

Prepare data

Config

Training

Testing

Copyright and License

Acknowledgment

Citation

Owner

Hu Wenbo

Yolo ros - YOLO-ROS for HUAWEI ATLAS200

This code is an implementation for Singing TTS.

Using Hotel Data to predict High Value And Potential VIP Guests

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

A project that uses optical flow and machine learning to detect aimhacking in video clips.

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation

Yolact-keras实例分割模型在keras当中的实现

Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

Backend code to use MCPI's python API to make infinite worlds with custom generation

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning

NumPy로 구현한 딥러닝 라이브러리입니다. (자동 미분 지원)

Efficient Training of Visual Transformers with Small Datasets

🤗 Paper Style Guide

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

Automate issue discovery for your projects against Lightning nightly and releases.

PlaidML is a framework for making deep learning work everywhere.

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)