[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Quantitative_results

Qualitative_result

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

--model_name Training modality Resolution Abs_Rel Sq_Rel $\delta<1.25$
M_640x192 Mono 640 x 192 0.105 0.769 0.892
M_1024x320 Mono 1024 x 320 0.102 0.734 0.898
M_1280x384 Mono 1280 x 384 0.102 0.715 0.900
MS_640x192 Mono + Stereo 640 x 192 0.102 0.752 0.894
MS_1024x320 Mono + Stereo 1024 x 320 0.096 0.694 0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality Input size .npy filesize Eigen disparities
Mono 640 x 192 326M Download 🔗
Mono 1024 x 320 871M Download 🔗
Mono 1280 x 384 1.27G Download 🔗
Mono + Stereo 640 x 192 326M Download 🔗
Mono + Stereo 1024 x 320 871M Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

Owner
Jiaxing Yan
1.Machine Vision 2.DeepLearning 3.C/C++ 4.Python
Jiaxing Yan
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
Sound Event Detection with FilterAugment

Sound Event Detection with FilterAugment Official implementation of Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Chal

43 Aug 28, 2022
Bayesian Optimization using GPflow

Note: This package is for use with GPFlow 1. For Bayesian optimization using GPFlow 2 please see Trieste, a joint effort with Secondmind. GPflowOpt GP

GPflow 257 Dec 26, 2022
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

이상윤 64 Oct 19, 2022
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-train

GMUM 90 Jan 08, 2023
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
Machine Learning Time-Series Platform

cesium: Open-Source Platform for Time Series Inference Summary cesium is an open source library that allows users to: extract features from raw time s

632 Dec 26, 2022
Hands-On Machine Learning for Algorithmic Trading, published by Packt

Hands-On Machine Learning for Algorithmic Trading Hands-On Machine Learning for Algorithmic Trading, published by Packt This is the code repository fo

Packt 981 Dec 29, 2022
Deep motion generator collections

GenMotion GenMotion (/gen’motion/) is a Python library for making skeletal animations. It enables easy dataset loading and experiment sharing for synt

23 May 24, 2022
Anime Face Detector using mmdet and mmpose

Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model

198 Jan 07, 2023
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien

Stanford Future Data Systems 70 Nov 16, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 04, 2023
Funnels: Exact maximum likelihood with dimensionality reduction.

Funnels This repository contains the code needed to reproduce the experiments from the paper: Funnels: Exact maximum likelihood with dimensionality re

2 Apr 21, 2022
Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

Peize Sun 537 Jan 04, 2023
Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

Primeira_Rede_Neural_Convolucional Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFM

Roney_Felipe 1 Jan 13, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
Multiview Dataset Toolkit

Multiview Dataset Toolkit Using multi-view cameras is a natural way to obtain a complete point cloud. However, there is to date only one multi-view 3D

11 Dec 22, 2022