[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Overview

EPCDepth

EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai

ICCV 2021 (arxiv)

EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.

βš™ Setup

1. Recommended environment

  • PyTorch 1.1
  • Python 3.6

2. KITTI data

You can download the raw KITTI dataset (about 175GB) by running:

wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"

Then, we recommend that you converted the png images to jpeg with this command:

find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg to .png in dataset/kitti_dataset.py. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.

3. Prepare depth hint

Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:

python precompute_depth_hints.py --data_path <your kitti path>

the generated depth hint will be saved to <your kitti path>/depth_hints. You should also pay attention to the suffix of the image.

πŸ“Š Evaluation

1. Download models

Download our pretrained model and put it to <your model path>.

Pre-trained PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE Ξ΄ < 1.25
model18_lr √ 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
d2 0.1 0.712 4.462 0.886
model18 √ 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
d2 0.0920 0.655 4.268 0.898
model50 √ 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901
d2 0.0905 0.629 4.187 0.900

Note: pt refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.

2. KITTI evaluation

This operation will save the estimated disparity map to <your disparity save path>. To recreate the results from our paper, run:

python main.py 
    --val --data_path <your kitti path> --resume <your model path>/model18.pth.tar 
    --use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>

The shape of saved disparities in numpy data format is (N, H, W).

3. NYUv2 evaluation

We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>. All evaluation codes are in the nyuv2Testing folder. Run:

python nyuv2_testing.py 
    --data_path <your nyuv2 testing date path>
    --resume <your mode path>/model50.pth.tar --post_process
    --save_dir <your nyuv2 disparity save path>

By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path> on NYUv2 dataset.

πŸ“¦ KITTI Results

You can download our precomputed disparity predictions from the following links:

Disparity PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE Ξ΄ < 1.25
disps18_lr √ 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
disps18 √ 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
disps50 √ 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901

πŸ–Ό Visualization

To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:

python main.py --vis --disps_path <your disparity save path>/disps50.npy

The visualized depth map will be saved to <your disparity save path>/disps_vis in png format.

⏳ Training

To train the model from scratch, run:

python main.py 
    --data_path <your kitti path> --model_dir <checkpoint save dir> 
    --logs_dir <tensorboard save dir> --pretrained --post_process 
    --use_depth_hint --use_spp_distillation --use_data_graft 
    --use_full_scale --gpu_ids 0

πŸ”§ Suggestion

  1. The magnitude of performance improvement: Data Grafting > Full-Scale > Self-Distillation. We noticed that the performance improvement of self-distillation becomes insignificant when the model capacity is large. Therefore, it is potential to explore more accurate self-distillation label extraction methods and better self-distillation strategies in the future.
  2. According to our experimental experience, the convergence of the self-supervised monocular depth estimation model using a larger backbone network is relatively unstable. You can verify your innovations on the small backbone first, and then adjust the learning rate appropriately to train on the big backbone.
  3. We found that using a pure RSU encoder has better performance than the traditional Resnet encoder, but unfortunately there is no RSU encoder pre-trained on Imagenet. Therefore, we firmly believe that someone can pre-train the RSU encoder on Imagenet and replace the resnet encoder of this model to get huge performance improvement.

βš– Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{epcdepth,
    title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
    author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
    year = {2021}
}

πŸ‘©β€ Acknowledgements

Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.

Owner
Rui Peng
Rui Peng
Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport This GitHub page provides code for reproducing the results i

Andrew Zammit Mangion 1 Nov 08, 2021
Neural Turing Machines (NTM) - PyTorch Implementation

PyTorch Neural Turing Machine (NTM) PyTorch implementation of Neural Turing Machines (NTM). An NTM is a memory augumented neural network (attached to

Guy Zana 519 Dec 21, 2022
Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning. Circuit Training is an open-s

Google Research 479 Dec 25, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
Computer Vision and Pattern Recognition, NUS CS4243, 2022

CS4243_2022 Computer Vision and Pattern Recognition, NUS CS4243, 2022 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : h

Xavier Bresson 142 Dec 15, 2022
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
Code release for "Making a Bird AI Expert Work for You and Me".

Making-a-Bird-AI-Expert-Work-for-You-and-Me Code release for "Making a Bird AI Expert Work for You and Me". arxiv (Coming soon...) Changelog 2021/12/6

PRIS-CV: Computer Vision Group 11 Dec 11, 2022
Simulation code and tutorial for BBHnet training data

Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det

0 May 31, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

MichaΓ«l Fonder 76 Jan 03, 2023
Implementation of ReSeg using PyTorch

Implementation of ReSeg using PyTorch ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation Pascal-Part Annotations Pascal VOC 2010

Onur Kaplan 46 Nov 23, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
World Models with TensorFlow 2

World Models This repo reproduces the original implementation of World Models. This implementation uses TensorFlow 2.2. Docker The easiest way to hand

Zac Wellmer 234 Nov 30, 2022
Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 06, 2023
Realtime micro-expression recognition using OpenCV and PyTorch

Micro-expression Recognition Realtime micro-expression recognition from scratch using OpenCV and PyTorch Try it out with a webcam or video using the e

Irfan 35 Dec 05, 2022
A model which classifies reviews as positive or negative.

SentiMent Analysis In this project I built a model to classify movie reviews fromn the IMDB dataset of 50K reviews. WordtoVec : Neural networks only w

Rishabh Bali 2 Feb 09, 2022
Implementation of Feedback Transformer in Pytorch

Feedback Transformer - Pytorch Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have acce

Phil Wang 93 Oct 04, 2022
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN in PyTorch PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. * All samples in READM

Taehoon Kim 1k Jan 04, 2023
PyTorch implementation of Super SloMo by Jiang et al.

Super-SloMo PyTorch implementation of "Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation" by Jiang H., Sun

Avinash Paliwal 2.9k Jan 03, 2023