Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Overview

Realtime Unsupervised Depth Estimation from an Image

This is the caffe implementation of our paper "Unsupervised CNN for single view depth estimation: Geometry to the rescue" published in ECCV 2016 with minor modifications. In this variant, we train the network end-to-end instead of in coarse to fine manner with deeper network (Resnet 50) and TVL1 loss instead of HS loss.

With the implementation we share the sample Resnet50by2 model trained on KITTI training set:

https://github.com/Ravi-Garg/Unsupervised_Depth_Estimation/blob/master/model/train_iter_40000.caffemodel

Shared model is a small variant of the 50 layer residual network from scratch on KITTI. Our model is <25 MB and predicts depths on 160x608 resolution images at over 30Hz on Nvidia Geforce GTX980 (50Hz on TITAN X). It can be used with caffe without any modification and we provide a simple matlab wrapper for testing.

Click on the image to watch preview of the results on youtube:

Screenshot

If you use our model or the code for your research please cite:

@inproceedings{garg2016unsupervised,
  title={Unsupervised CNN for single view depth estimation: Geometry to the rescue},
  author={Garg, Ravi and Kumar, BG Vijay and Carneiro, Gustavo and Reid, Ian},
  booktitle={European Conference on Computer Vision},
  pages={740--756},
  year={2016},
  organization={Springer}
}

Training Procedure

This model was trained on 23200 raw stereo pairs of KITTI taken from city, residential and road sequences. Images from other sequences of KITTI were left untouched. A subset of 697 images from 28 sequences froms the testset, leaving the remaining 33 sequences from these categories which can be used for training.

To use the same training data use the splits spacified in the file 'train_test_split.mat'.

Our model is trained end-to-end from scratch with adam solver (momentum1 = 0.9 , momentom2 = 0.999, learning rate =10e-3 ) for 40,000 iterations on 4 gpus with batchsize 14 per GPU. This model is a pre-release further tuning of hyperparameters should improve results. Only left-right flips as described in the paper were used to train the provided network. Other agumentations described in the paper and runtime shuffle were not used but should also lead to performance imrovement.

Here is the training loss recorded per 20 iterations:

loss per 20 iterations

Note: We have resized the KITTI images to 160x608 for training - which changes the aspect ratio of the images. Thus for proper evaluation on KITTI the images needs to be resized to this resolution and predicted disparities should be scaled by a factor of 608/width_of_input_image before computing depth. For ease in citing the results for further publications, we share the performance measures.

Our model gives following results on KITTI test-set without any post processing:

RMSE(linear): 4.400866

RMSE(log) : 0.233548

RMSE(log10) : 0.101441

Abs rel diff: 0.137796

Sq rel diff : 0.824861

accuracy THr 1.25 : 0.809765

accuracy THr 1.25 sq: 0.935108

accuracy THr 1.25 cube: 0.974739


The test-set consists of 697 images which was used in https://www.cs.nyu.edu/~deigen/depth/kitti_depth_predictions.mat Depth Predictions were first clipped to depth values between 0 and 50 meters and evaluated only in the region spacified in the given mask.

#Network Architecture

Architecture of our networks closely follow Residual networks scheme. We start from resnet 50 by 2 architecture and have replaced strided convolutions with 2x2 MAX pooling layers like VGG. The first 7x7 convolution with stride 2 is replaced with the 7x7 convolution with no stride and the max-pooled output at ½ resolution is passed through an extra 3x3 convolutional (128 features)->relu->2x2 pooling block. Rest of the network followes resnet50 with half the parameters every layer.

For dense prediction we have followed the skip-connections as specified in FCN and our ECCV paper. We have introduced a learnable scale layer with weight decay 0.01 before every 1x1 convolution of FCN skip-connections which allows us to merge mid-level features more efficiently by:

  • Adaptively selecting the mid-level features which are more correlated to depth of the scene.
  • Making 1x1 convolutions for projections more stable for end to end training.

Further analysis and visualizations of learned features will be released shortly on the arxiv: https://arxiv.org/pdf/1603.04992v2.pdf

Using the code

To train and finetune networks on your own data, you need to compile caffe with additional:

  • “AbsLoss” layer for L1 loss minimization,

  • “Warping” layer for image warpping given flow

  • and modified "filler.hpp" to compute image gradient with convolutions which we share here.

License

For academic usage, the code is released under the permissive BSD license. For any commercial purpose, please contact the authors.

Contact

Please report any known issues on this thread of to [email protected]

Owner
Ravi Garg
Ravi Garg
Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.

COLIEE 2021 - task 2: Legal Case Entailment This repository contains the code to reproduce NeuralMind's submissions to COLIEE 2021 presented in the pa

NeuralMind 13 Dec 16, 2022
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

This repository holds the implementation for paper Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach Download our preproc

Qitian Wu 42 Dec 27, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
Wordplay, an artificial Intelligence based crossword puzzle solver.

Wordplay, AI based crossword puzzle solver A crossword is a word puzzle that usually takes the form of a square or a rectangular grid of white- and bl

Vaibhaw 4 Nov 16, 2022
All course materials for the Zero to Mastery Machine Learning and Data Science course.

Zero to Mastery Machine Learning Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Maste

Daniel Bourke 1.6k Jan 08, 2023
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

xcfeng 55 Dec 27, 2022
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

NeoML 704 Dec 27, 2022
Simple Pixelbot for Diablo 2 Resurrected written in python and opencv.

Simple Pixelbot for Diablo 2 Resurrected written in python and opencv. Obviously only use it in offline mode as it is against the TOS of Blizzard to use it in online mode!

468 Jan 03, 2023
Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

Source code for "UCD participation in TREC-IS 2020A, 2020B and 2021A". *** update at: 2021/05/25 This repo so far relates to the following work: Trans

Congcong Wang 4 Oct 19, 2021
Bayesian Meta-Learning Through Variational Gaussian Processes

vmgp This is the repository of Vivek Myers and Nikhil Sardana for our CS 330 final project, Bayesian Meta-Learning Through Variational Gaussian Proces

Vivek Myers 2 Nov 17, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Face-Detection-flask-gunicorn-nginx-docker This is a simple implementation of dockerized face-detection restful-API implemented with flask, Nginx, and

Pooya-Mohammadi 30 Dec 17, 2022
Converts given image (png, jpg, etc) to amogus gif.

Image to Amogus Converter Converts given image (.png, .jpg, etc) to an amogus gif! Usage Place image in the /target/ folder (or anywhere realistically

Hank Magan 1 Nov 24, 2021
Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

Shaping Visual Representations with Attributes for Few-Shot Learning This code implements the Shaping Visual Representations with Attributes for Few-S

chx_nju 9 Sep 01, 2022
Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

Lex Fridman 9.5k Jan 07, 2023
PromptDet: Expand Your Detector Vocabulary with Uncurated Images

PromptDet: Expand Your Detector Vocabulary with Uncurated Images Paper Website Introduction The goal of this work is to establish a scalable pipeline

103 Dec 20, 2022
[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation This is the official implementation for the method described in Ch

Jiaxing Yan 27 Dec 30, 2022
Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to

Andrew Buttigieg 3 Aug 24, 2021
The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Boyan Zhou, Quan Cui, Xiu-Shen Wei*, Zhao-Min Chen This repo

Megvii-Nanjing 616 Dec 21, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 05, 2023