Apply our monocular depth boosting to your own network!

Overview

MergeNet - Boost Your Own Depth

Boost custom or edited monocular depth maps using MergeNet

Input Original result After manual editing of base
patchselection patchselection patchselection

You can find our Google Colaboratory notebook here. Open In Colab

In this repository, we present a stand-alone implementation of our merging operator we use in our recent work:

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, Yağız Aksoy. Video, Main pdf, Supplementary pdf, Project Page. Github repo.

If you are an artist:

Although we are presenting few simple examples here, both low-resolution and high-resolution depth maps can be freely edited using any program before merging with our method.

Feel free to experiment and share your results with us!

If you are a researcher developing a new (CNN-based) Monocular Depth Estimation method:

This repository is a full implementation of our double-estimation framework. Double estimation uses a base-resolution result and a high-resolution result. The optimum high-resolution for a given image, R20 resolution, depends on the receptive field size of your network (the training resolution is a good approximation) and the image content. The code for R20 computation is also provided here.

To demonstrate the high-resolution performance of your network, you can simply generate the base and high-res estimates on any dataset and use this repository to apply our double estimation method to your own work.

Our Github repo for the main project also includes the implementation of our detail-focused monocular depth performance metric D^3R.

Mix'n'match depths from different networks or use your own custom-edited ones.

In the image below, we show that choosing a different base estimate can improve the depth for the city:

Input Base and details from [MiDaS][1] Base from [LeRes][2] and details from [MiDaS][1]
patchselection patchselection patchselection

To get the optimal result for a given scene, you may want to try multiple methods in both low- and high-resolutions and pick your favourite for each case.

Input Base from [MiDaS v3 / DPT][3] Base from [MiDaS v3 / DPT][3] and details from [MiDaS v2][1]
patchselection patchselection patchselection

Moreover, you can simply edit the base image before merging using any image editing tool for more creative control:

Input Base and details from [MiDaS][1] With edited base from [MiDaS][1]
patchselection patchselection patchselection

How does it work?

merge

This repository lets you combine two input depth maps with certain characteristics.

Low-res base depth

The network uses the base estimate as the main structure of the scene. Typically this is the default-resolution result of a monocular depth estimation network at around 300x300 resolution.

This base estimate is a good candidate for editing due to its low-resolution nature.

Monocular depth estimation methods with geometric consistency optimizations can be used as the base estimation to merge details onto a consistent base.

High-res depth with details

The merging operation transfers the details from this high-resolution depth map onto the structure provided by the low-resolution base pair.

The high-resolution input does not need structural consistency and is typically generated by feeding the input image at a much higher resolution than the training resolution of a given monocular depth estimation network.

You can compute the optimal high-resolution estimation size for a given image using our R20 resolution calculator, also provided in this repository. You can also simply use 2x or 3x resolution to simply add more details.

For more information on this project:

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, Yağız Aksoy. Main pdf, Supplementary pdf, Project Page. Github repo.

video

Citation

This implementation is provided for academic use only. Please cite our paper if you use this code or any of the models.

@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}

Credits

The "Merge model" code skeleton (./pix2pix folder) was adapted from the [pytorch-CycleGAN-and-pix2pix][4] repository.
[1]: https://github.com/intel-isl/MiDaS/tree/v2
[2]: https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS
[3]: https://github.com/isl-org/DPT
[4]: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix \

Owner
Computational Photography Lab @ SFU
Computational Photography Lab at Simon Fraser University, lead by @yaksoy
Computational Photography Lab @ SFU
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 05, 2022
MADT: Offline Pre-trained Multi-Agent Decision Transformer

MADT: Offline Pre-trained Multi-Agent Decision Transformer A link to our paper can be found on Arxiv. Overview Official codebase for Offline Pre-train

Linghui Meng 51 Dec 21, 2022
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

Herman Kamper 13 Dec 11, 2022
Tiny Object Detection in Aerial Images.

AI-TOD AI-TOD is a dataset for tiny object detection in aerial images. [Paper] [Dataset] Description AI-TOD comes with 700,621 object instances for ei

jwwangchn 116 Dec 30, 2022
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks - Official Project Page This repository contains the code develope

Amirsina Torfi 1.7k Dec 18, 2022
How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

LETGAN How to Learn a Domain Adaptive Event Simulator? ACM MM 2021 Running Environment: pytorch=1.4, 1 NVIDIA-1080TI. More details can be found in pap

CVTEAM 4 Sep 20, 2022
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Book My Show PC What is this? Movie Ticket Reservation App made with Tkinter. Tk

The Nithin Balaji 3 Dec 09, 2022
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo

Amin Rezaei 126 Dec 27, 2022
Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

29 Oct 01, 2022
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral).

334 Dec 31, 2022
Python Wrapper for Embree

pyembree Python Wrapper for Embree Installation You can install pyembree (and embree) via the conda-forge package. $ conda install -c conda-forge pyem

Anthony Scopatz 67 Dec 24, 2022
Setup and customize deep learning environment in seconds.

Deepo is a series of Docker images that allows you to quickly set up your deep learning research environment supports almost all commonly used deep le

Ming 6.3k Jan 06, 2023
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
This is a TensorFlow implementation for C2-Rec

This is a TensorFlow implementation for C2-Rec We refer to the repo SASRec. Requirements requirement.txt Datasets This repo includes Amazon Beauty dat

7 Nov 14, 2022
Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Graph Convolution Simulator (GCS) Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions" Requirements: PyTor

yifan 10 Oct 18, 2022
This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

Aryan Dutta 1 Feb 02, 2022
Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

Rush Kapoor 2 Nov 21, 2022