Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Last update: Dec 27, 2022

Related tags

Overview

Instance-wise Occlusion and Depth Orders in Natural Scenes

Official source code. Appears at CVPR 2022

This repository provides a new dataset, named InstaOrder, that can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. This repository also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches.

Installation

This code has been developed under Anaconda(Python 3.6), Pytorch 1.7.1, torchvision 0.8.2 and CUDA 10.1. Please install following environments:

# build conda environment
conda create --name order python=3.6
conda activate order

# install requirements
pip install -r requirements.txt

# install COCO API
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Visualization

Check InstaOrder_vis.ipynb to visualize InstaOrder dataset including instance masks, occlusion order, and depth order.

Training

The experiments folder contains train and test scripts of experiments demonstrated in the paper.

To train {MODEL} with {DATASET},

Download {DATASET} following this.
Set ${base_dir} correctly in experiments/{DATASET}/{MODEL}/config.yaml
(Optional) To train InstaDepthNet, download MiDaS-v2.1 model-f6b98070.pt under ${base_dir}/data/out/InstaOrder_ckpt

Run the script file as follow:

sh experiments/{DATASET}/{MODEL}/train.sh

# Example of training InstaOrderNet^o (Table3 in the main paper) from the scratch
sh experiments/InstaOrder/InstaOrderNet_o/train.sh

Inference

Download pretrained models InstaOrder_ckpt.zip (3.5G) and unzip files following the below structure. Pretrained models are named by {DATASET}_{MODEL}.pth.tar

${base_dir}
|--data
|    |--out
|    |    |--InstaOrder_ckpt
|    |    |    |--COCOA_InstaOrderNet_o.pth.tar
|    |    |    |--COCOA_OrderNet.pth.tar
|    |    |    |--COCOA_pcnet_m.pth.tar
|    |    |    |--InstaOrder_InstaDepthNet_d.pth.tar
|    |    |    |--InstaOrder_InstaDepthNet_od.pth.tar
|    |    |    |--InstaOrder_InstaOrderNet_d.pth.tar
|    |    |    |--InstaOrder_InstaOrderNet_o.pth.tar
|    |    |    |--InstaOrder_InstaOrderNet_od.pth.tar
|    |    |    |--InstaOrder_OrderNet.pth.tar
|    |    |    |--InstaOrder_OrderNet_ext.pth.tar  
|    |    |    |--InstaOrder_pcnet_m.pth.tar
|    |    |    |--KINS_InstaOrderNet_o.pth.tar
|    |    |    |--KINS_OrderNet.pth.tar
|    |    |    |--KINS_pcnet_m.pth.tar

(Optional) To test InstaDepthNet, download MiDaS-v2.1 model-f6b98070.pt under ${base_dir}/data/out/InstaOrder_ckpt
Set ${base_dir} correctly in experiments/{DATASET}/{MODEL}/config.yaml

To test {MODEL} with {DATASET}, run the script file as follow:

sh experiments/{DATASET}/{MODEL}/test.sh

# Example of reproducing the accuracy of InstaOrderNet^o (Table3 in the main paper)
sh experiments/InstaOrder/InstaOrderNet_o/test.sh

Datasets

InstaOrder dataset

To use InstaOrder, download files following the below structure

${base_dir}
|--data
|    |--COCO
|    |    |--train2017/
|    |    |--val2017/
|    |    |--annotations/
|    |    |    |--instances_train2017.json
|    |    |    |--instances_val2017.json
|    |    |    |--InstaOrder_train2017.json
|    |    |    |--InstaOrder_val2017.json

COCOA dataset

To use COCOA, download files following the below structure

${base_dir}
|--data
|    |--COCO
|    |    |--train2014/
|    |    |--val2014/
|    |    |--annotations/
|    |    |    |--COCO_amodal_train2014.json 
|    |    |    |--COCO_amodal_val2014.json
|    |    |    |--COCO_amodal_val2014.json

KINS dataset

To use KINS, download files following the below structure

KINS dataset

${base_dir}
|--data
|    |--KINS
|    |    |--training/
|    |    |--testing/
|    |    |--instances_val.json
|    |    |--instances_train.json

DIW dataset

To use DIW, download files following the below structure

DIW Dataset

${base_dir}
|--data
|    |--DIW
|    |    |--DIW_test/
|    |    |--DIW_Annotations
|    |    |    |--DIW_test.csv

Citing InstaOrder

If you find this code/data useful in your research then please cite our paper:

@inproceedings{lee2022instaorder,
  title={{Instance-wise Occlusion and Depth Orders in Natural Scenes}},
  author={Hyunmin Lee and Jaesik Park},
  booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgement

We have reffered to and borrowed the implementations from Xiaohang Zhan

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Related tags

Overview

Instance-wise Occlusion and Depth Orders in Natural Scenes

Installation

Visualization

Training

Inference

Datasets

InstaOrder dataset

COCOA dataset

KINS dataset

DIW dataset

Citing InstaOrder

Acknowledgement

Owner

PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

PyTorch ,ONNX and TensorRT implementation of YOLOv4

Safe Bayesian Optimization

Automatically download the cwru data set, and then divide it into training data set and test data set

Painting app using Python machine learning and vision technology.

SNE-RoadSeg in PyTorch, ECCV 2020

Compact Bidirectional Transformer for Image Captioning

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

An All-MLP solution for Vision, from Google AI

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

SCU OlympicsRunning Baseline

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

ScaleNet: A Shallow Architecture for Scale Estimation

Simple STAC Catalogs discovery tool.

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Exploring Simple Siamese Representation Learning

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

This repository provides the code for MedViLL(Medical Vision Language Learner).