[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Last update: Dec 30, 2022

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

`--model_name`	Training modality	Resolution	Abs_Rel	Sq_Rel	$\delta<1.25$
`M_640x192`	Mono	640 x 192	0.105	0.769	0.892
`M_1024x320`	Mono	1024 x 320	0.102	0.734	0.898
`M_1280x384`	Mono	1280 x 384	0.102	0.715	0.900
`MS_640x192`	Mono + Stereo	640 x 192	0.102	0.752	0.894
`MS_1024x320`	Mono + Stereo	1024 x 320	0.096	0.694	0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality	Input size	`.npy` filesize	Eigen disparities
Mono	640 x 192	326M	Download 🔗
Mono	1024 x 320	871M	Download 🔗
Mono	1280 x 384	1.27G	Download 🔗
Mono + Stereo	640 x 192	326M	Download 🔗
Mono + Stereo	1024 x 320	871M	Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

A DCGAN to generate anime faces using custom mined dataset

Meta Language-Specific Layers in Multilingual Language Models

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

Node for thenewboston digital currency network.

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

《Geo Word Clouds》paper implementation

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Perspective: Julia for Biologists

A collection of Google research projects related to Federated Learning and Federated Analytics.

Official implementation of the paper DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows

Camview - A CLI-tool used to stream CCTV online footage based on URL params

TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

Bu repo SAHI uygulamasını mantığını öğreniyoruz.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.