SMD-Nets: Stereo Mixture Density Networks

Related tags

Deep LearningSMD-Nets
Overview

SMD-Nets: Stereo Mixture Density Networks

Alt text

This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021) by Fabio Tosi, Yiyi Liao, Carolin Schmitt and Andreas Geiger

Contributions:

  • A novel learning framework for stereo matching that exploits compactly parameterized bimodal mixture densities as output representation and can be trained using a simple likelihood-based loss function. Our simple formulation lets us avoid bleeding artifacts at depth discontinuities and provides a measure for aleatoric uncertainty.

  • A continuous function formulation aimed at estimating disparities at arbitrary spatial resolution with constant memory footprint.

  • A new large-scale synthetic binocular stereo dataset with ground truth disparities at 3840×2160 resolution, comprising photo-realistic renderings of indoor and outdoor environments.

For more details, please check:

[Paper] [Supplementary] [Poster] [Video] [Blog]

If you find this code useful in your research, please cite:

@INPROCEEDINGS{Tosi2021CVPR,
  author = {Fabio Tosi and Yiyi Liao and Carolin Schmitt and Andreas Geiger},
  title = {SMD-Nets: Stereo Mixture Density Networks},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
} 

Requirements

This code was tested with Python 3.8, Pytotch 1.8, CUDA 11.2 and Ubuntu 20.04.
All our experiments were performed on a single NVIDIA Titan V100 GPU.
Requirements can be installed using the following script:

pip install -r requirements

Datasets

We create our synthetic dataset, UnrealStereo4K, using the popular game engine Unreal Engine combined with the open-source plugin UnrealCV.

UnrealStereo4K

Our photo-realistic synthetic passive binocular UnrealStereo4K dataset consists of images of 8 static scenes, including indoor and outdoor environments. We rendered stereo pairs at 3840×2160 resolution for each scene with pixel-accurate ground truth disparity maps (aligned with both the left and the right images!) and ground truth poses.

You can automatically download the entire synthetic binocular stereo dataset using the download_data.sh script in the scripts folder. In alternative, you can download each scene individually:

UnrealStereo4K_00000.zip [74 GB]
UnrealStereo4K_00001.zip [73 GB]
UnrealStereo4K_00002.zip [74 GB]
UnrealStereo4K_00003.zip [73 GB]
UnrealStereo4K_00004.zip [72 GB]
UnrealStereo4K_00005.zip [74 GB]
UnrealStereo4K_00006.zip [67 GB]
UnrealStereo4K_00007.zip [76 GB]
UnrealStereo4K_00008.zip [16 GB] - It contains 200 stereo pairs only, used as out-of-domain test set

Warning!: All the RGB images are PNG files at 8 MPx. This notably slows down the training process due to the expensive dataloading operation. Thus, we suggest compressing the images to raw binary files to speed up the process and trainings (Pay attention to edit the filenames accordingly). You can use the following code to convert (offline) the stereo images (Image0 and Image1 folders) to a raw format:

img_path=/path/to/the/image
out = open(img_path.replace("png", "raw"), 'wb') 
img = cv2.imread(img_path, -1)
img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
img.tofile(out)
out.close()

Training

All training and testing scripts are provided in the scripts folder.
As an example, use the following command to train SMD-Nets on our UnrealStereo4K dataset.

python apps/train.py --dataroot $dataroot \
                     --checkpoints_path $checkpoints_path \
                     --training_file $training_file \
                     --testing_file $testing_file \
                     --results_path $results_path \
                     --mode $mode \
                     --name $name \
                     --batch_size $batch_size \
                     --num_epoch $num_epoch \
                     --learning_rate $learning_rate \
                     --gamma $gamma \
                     --crop_height $crop_height \
                     --crop_width $crop_width \
                     --num_sample_inout $num_sample_inout \
                     --aspect_ratio $aspect_ratio \
                     --sampling $sampling \
                     --output_representation $output_representation \
                     --backbone $backbone

For a detailed description of training options, please take a look at lib/options.py

In order to monitor and visualize the training process, you can start a tensorboard session with:

tensorboard --logdir checkpoints

Evaluation

Use the following command to evaluate the trained SMD-Nets on our UnrealStereo4K dataset.

python apps/test.py --dataroot $dataroot \
                    --testing_file $testing_file \
                    --results_path $results_path \
                    --mode $mode \
                    --batch_size 1 \
                    --superes_factor $superes_factor \
                    --aspect_ratio $aspect_ratio \
                    --output_representation $output_representation \
                    --load_checkpoint_path $checkpoints_path \
                    --backbone $backbone 

Warning! The soft edge error (SEE) on the KITTI dataset requires instance segmentation maps from the KITTI 2015 dataset.

Stereo Ultra High-Resolution: if you want to estimate a disparity map at arbitrary spatial resolution given a low resolution stereo pair at testing time, just use a different value for the superres_factor parameter (e.g. 2,4,8..32!). Below, a comparison of our model using the PSMNet backbone at 128Mpx resolution (top) and the original PSMNet at 0.5Mpx resolution (bottom), both taking stereo pairs at 0.5Mpx resolution as input.

Pretrained models

You can download pre-trained models on our UnrealStereo4K dataset from the following links:

Qualitative results

Disparity Visualization. Some qualitative results of the proposed SMD-Nets using PSMNet as stereo backbone. From left to right, the input image from the UnrealStereo4K test set, the predicted disparity and the corresponding error map. Please zoom-in to better perceive details near depth boundaries.

Point Cloud Visualization. Below, instead, we show point cloud visualizations on UnrealStereo4K for both the passive binocular stereo and the active depth datasets, adopting HSMNet as backbone. From left to right, the reference image, the results obtained using a standard disparity regression (i.e., disparity point estimate), a unimodal Laplacian distribution and our bimodal Laplacian mixture distribution. Note that our bimodal representation notably alleviates bleeding artifacts near object boundaries compared to both disparity regression and the unimodal formulation.

Contacts

For questions, please send an email to [email protected]

Acknowledgements

We thank the authors that shared the code of their works. In particular:

  • Jia-Ren Chang for providing the code of PSMNet.
  • Gengshan Yang for providing the code of HSMNet.
  • Clement Godard for providing the code of Monodepth (extended to Stereodepth).
  • Shunsuke Saito for providing the code of PIFu
Owner
Fabio Tosi
Postdoc Researcher at University of Bologna - Computer Science and Engineering
Fabio Tosi
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 13 Nov 29, 2022
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

Model Serving Made Easy

The easiest way to build Machine Learning APIs BentoML makes moving trained ML models to production easy: Package models trained with any ML framework

BentoML 4.4k Jan 08, 2023
MediaPipeのPythonパッケージのサンプルです。2020/12/11時点でPython実装のある4機能(Hands、Pose、Face Mesh、Holistic)について用意しています。

mediapipe-python-sample MediaPipeのPythonパッケージのサンプルです。 2020/12/11時点でPython実装のある以下4機能について用意しています。 Hands Pose Face Mesh Holistic Requirement mediapipe 0.

KazuhitoTakahashi 217 Dec 12, 2022
Task-related Saliency Network For Few-shot learning

Task-related Saliency Network For Few-shot learning This is an official implementation in Tensorflow of TRSN. Abstract An essential cue of human wisdo

1 Nov 18, 2021
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
PyTorch implementation of probabilistic deep forecast applied to air quality.

Probabilistic Deep Forecast PyTorch implementation of a paper, titled: Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting

Abdulmajid Murad 13 Nov 16, 2022
Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

ConSeC is a novel approach to Word Sense Disambiguation (WSD), accepted at EMNLP 2021. It frames WSD as a text extraction task and features a feedback loop strategy that allows the disambiguation of

Sapienza NLP group 36 Dec 13, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 04, 2023
CS5242_2021 - Neural Networks and Deep Learning, NUS CS5242, 2021

CS5242_2021 Neural Networks and Deep Learning, NUS CS5242, 2021 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : https:/

Xavier Bresson 165 Oct 25, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., de

Jie Huang 14 Oct 21, 2022
Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

All about AI with Cheat-Sheets(+100 Cheat-sheets), Free Online Books, Courses, Videos and Lectures, Papers, Tutorials, Researchers, Websites, Datasets

Niraj Lunavat 1.2k Jan 01, 2023
An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

11 Nov 29, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 08, 2023
Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Voxel-based Network for Shape Completion by Leveraging Edge Generation This is the PyTorch implementation for the paper "Voxel-based Network for Shape

10 Dec 04, 2022
Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) is a new approach of noise reduction methods. In this repository is shown the package developed for this new method based on \citepaper.

Fully Adaptive Bayesian Algorithm for Data Analysis FABADA FABADA is a novel non-parametric noise reduction technique which arise from the point of vi

18 Oct 20, 2022