(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

Overview

Wasserstein Distances for Stereo Disparity Estimation

Accepted in NeurIPS 2020 as Spotlight. [Project Page]

Wasserstein Distances for Stereo Disparity Estimation

by Divyansh Garg, Yan Wang, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger and Wei-Lun Chao

Figure

Citation

@inproceedings{div2020wstereo,
  title={Wasserstein Distances for Stereo Disparity Estimation},
  author={Garg, Divyansh and Wang, Yan and Hariharan, Bharath and Campbell, Mark and Weinberger, Kilian and Chao, Wei-Lun},
  booktitle={NeurIPS},
  year={2020}
}

Introduction

Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving.

Contents

Our Wasserstein loss modification W_loss can be easily plugged in existing stereo depth models to improve the training and obtain better results.

We release the code for CDN-PSMNet and CDN-SDN models.

Requirements

  1. Python 3.7
  2. Pytorch 1.2.0+
  3. CUDA
  4. pip install -r ./requirements.txt
  5. SceneFlow
  6. KITTI

Pretrained Models

TO BE ADDED.

Datasets

You have to download the SceneFlow and KITTI datasets. The structures of the datasets are shown in below.

SceneFlow Dataset Structure

SceneFlow
    | monkaa
        | frames_cleanpass
        | disparity
    | driving
        | frames_cleanpass
        | disparity
    | flyingthings3d
        | frames_cleanpass 
        | disparity

KITTI Object Detection Dataset Structure

KITTI
    | training
        | calib
        | image_2
        | image_3
        | velodyne
    | testing
        | calib
        | image_2
        | image_3

Generate soft-links of SceneFlow Datasets. The results will be saved in ./sceneflow folder. Please change to fakepath path-to-SceneFlow to the SceneFlow dataset location before running the script.

python sceneflow.py --path path-to-SceneFlow --force

Convert the KITTI velodyne ground truths to depth maps. Please change to fakepath path-to-KITTI to the SceneFlow dataset location before running the script.

python ./src/preprocess/generate_depth_map.py --data_path path-to-KITTI/ --split_file ./split/trainval.txt

Optionally download KITTI2015 datasets for evaluating stereo disparity models.

Training and Inference

We have provided all pretrained models Pretrained Models. If you only want to generate the predictions, you can directly go to step 3.

The default setting requires four gpus to train. You can use smaller batch sizes which are btrain and bval, if you don't have enough gpus.

We provide code for both stereo disparity and stereo depth models.

1 Train CDN-SDN from Scratch on SceneFlow Dataset

python ./src/main_depth.py -c src/configs/sceneflow_w1.config

The checkpoints are saved in ./results/stack_sceneflow_w1/.

Follow same procedure to train stereo disparity model, but use src/main_disp.py and change to a disparity config.

2 Train CDN-SDN on KITTI Dataset

python ./src/main_depth.py -c src/configs/kitti_w1.config \
    --pretrain ./results/sceneflow_w1/checkpoint.pth.tar --dataset  path-to-KITTI/training/

Before running, please change the fakepath path-to-KITTI/ to the correct one. --pretrain is the path to the pretrained model on SceneFlow. The training results are saved in ./results/kitti_w1_train.

If you are working on evaluating CDN on KITTI testing set, you might want to train CDN on training+validation sets. The training results will be saved in ./results/sdn_kitti_trainval.

python ./src/main_depth.py -c src/configs/kitti_w1.config \
    --pretrain ./results/sceneflow_w1/checkpoint.pth.tar \
    --dataset  path-to-KITTI/training/ --split_train ./split/trainval.txt \
    --save_path ./results/sdn_kitti_trainval

The disparity models can also be trained on KITTI2015 datasets using src/kitti2015_w1_disp.config.

3 Generate Predictions

Please change the fakepath path-to-KITTI. Moreover, if you use the our provided checkpoint, please modify the value of --resume to the checkpoint location.

  • a. Using the model trained on KITTI training set, and generating predictions on training + validation sets.
python ./src/main_depth.py -c src/configs/kitti_w1.config \
    --resume ./results/sdn_kitti_train/checkpoint.pth.tar --datapath  path-to-KITTI/training/ \
    --data_list ./split/trainval.txt --generate_depth_map --data_tag trainval

The results will be saved in ./results/sdn_kitti_train/depth_maps_trainval/.

  • b. Using the model trained on KITTI training + validation set, and generating predictions on testing sets. You will use them when you want to submit your results to the leaderboard.

The results will be saved in ./results/sdn_kitti_trainval_set/depth_maps_trainval/.

# testing sets
python ./src/main_depth.py -c src/configs/kitti_w1.config \
    --resume ./results/sdn_kitti_trainval/checkpoint.pth.tar --datapath  path-to-KITTI/testing/ \
    --data_list=./split/test.txt --generate_depth_map --data_tag test

The results will be saved in ./results/sdn_kitti_trainval/depth_maps_test/.

4 Train 3D Detection with Pseudo-LiDAR

For training 3D object detection models, follow step 4 and after in the Pseudo-LiDAR_V2 repo https://github.com/mileyan/Pseudo_Lidar_V2.

Results

Results on the Stereo Disparity

Figure

3D Object Detection Results on KITTI leader board

Figure

Questions

Please feel free to email us if you have any questions.

Divyansh Garg [email protected] Yan Wang [email protected] Wei-Lun Chao [email protected]

Owner
Divyansh Garg
Making robots intelligent
Divyansh Garg
Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration Project Page | Paper Yifan Peng*, Suyeon Choi*, Jongh

Stanford Computational Imaging Lab 19 Dec 11, 2022
CoRe: Contrastive Recurrent State-Space Models

CoRe: Contrastive Recurrent State-Space Models This code implements the CoRe model and reproduces experimental results found in Robust Robotic Control

Apple 21 Aug 11, 2022
Datasets for new state-of-the-art challenge in disentanglement learning

High resolution disentanglement datasets This repository contains the Falcor3D and Isaac3D datasets, which present a state-of-the-art challenge for co

NVIDIA Research Projects 37 May 26, 2022
ScaleNet: A Shallow Architecture for Scale Estimation

ScaleNet: A Shallow Architecture for Scale Estimation Repository for the code of ScaleNet paper: "ScaleNet: A Shallow Architecture for Scale Estimatio

Axel Barroso 34 Nov 09, 2022
A script depending on VASP output for calculating Fermi-Softness.

Fermi softness calculation for Vienna Ab initio Simulation Package (VASP) Update 1.1.0: Big update: Rewrote the code. Use Bader atomic division instea

qslin 11 Nov 08, 2022
Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)

CFNet(CVPR 2021) This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuch

106 Dec 28, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

1.1k Dec 30, 2022
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)

336 Nov 25, 2022
GoodNews Everyone! Context driven entity aware captioning for news images

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy! Model preview: Huge T

117 Dec 19, 2022
Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

PanopticStudio Toolbox This repository has a toolbox to download, process, and visualize the Panoptic Studio (Panoptic) data. Note: Sep-21-2020: Curre

335 Jan 09, 2023
Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation

NorCal Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation On Model Calibration for Long-Tailed Object Detec

Tai-Yu (Daniel) Pan 24 Dec 25, 2022
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Aleksei Petrenko 191 Dec 23, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

xiezhenyu 2 Oct 11, 2022
SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

SalFBNet This repository includes Pytorch implementation for the following paper: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolu

12 Aug 12, 2022
Iterative Normalization: Beyond Standardization towards Efficient Whitening

IterNorm Code for reproducing the results in the following paper: Iterative Normalization: Beyond Standardization towards Efficient Whitening Lei Huan

Lei Huang 21 Dec 27, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022