Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Last update: Dec 22, 2022

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.
Trained on KITTI in a self-supervised manner, and tested on DAVIS.

This repository is the official PyTorch implementation of the paper:

   Self-Supervised Multi-Frame Monocular Scene Flow
   Junhwa Hur and Stefan Roth
   CVPR, 2021
   Arxiv

Contact: junhwa.hur[at]gmail.com

Installation

The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:

conda env create -f environment.yml
conda activate multi-mono-sf

(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):

./install_correlation.sh

After installing it, turn on this flag --correlation_cuda_enabled=True in training/evaluation script files.

Dataset

Please download the following to datasets for the experiment:

KITTI Raw Data (synced+rectified data, please refer MonoDepth2 for downloading all data more conveniently.)
merge KITTI Scene Flow 2015 and Multi-view extension in the same folder.

To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:

find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2 and image_3 into jpg and save them into the seperate folder image_2_jpg and image_3_jpg.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.

Training and Inference

The scripts folder contains training/inference scripts.

For self-supervised training, you can simply run the following script files:

Script	Training	Dataset
`./train_selfsup.sh`	Self-supervised	KITTI Split

Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.

Script	Training	Dataset
`./ft_1st_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015
`./ft_2nd_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015

In the script files, please configure these following PATHs for experiments:

DATA_HOME : the directory where the training or test is located in your local system.
EXPERIMENTS_HOME : your own experiment directory where checkpoints and log files will be saved.

To test pretrained models, you can simply run the following script files:

Script	Training	Dataset
`./eval_selfsup_train.sh`	self-supervised	KITTI 2015 Train
`./eval_ft_test.sh`	fine-tuned	KITTI 2015 Test
`./eval_davis.sh`	self-supervised	DAVIS (one scene)
`./eval_davis_all.sh`	self-supervised	DAVIS (all scenes)

To save visuailization of outputs, please turn on --save_vis=True in the script.
To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on --save_out=True in the script.

Pretrained Models

The checkpoints folder contains the checkpoints of the pretrained models.

Acknowledgement

Please cite our paper if you use our source code.

@inproceedings{Hur:2021:SSM,  
  Author = {Junhwa Hur and Stefan Roth},  
  Booktitle = {CVPR},  
  Title = {Self-Supervised Multi-Frame Monocular Scene Flow},  
  Year = {2021}  
}

Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

Installation

Dataset

Training and Inference

Pretrained Models

Acknowledgement

Owner

Visual Inference Lab @TU Darmstadt

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

A python library for highly configurable transformers - easing model architecture search and experimentation.

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

基于Flask开发后端、VUE开发前端框架，在WEB端部署YOLOv5目标检测模型

State of the art Semantic Sentence Embeddings

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

Official repository for: Continuous Control With Ensemble DeepDeterministic Policy Gradients

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

PyTorch implementation of PSPNet segmentation network

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

Seq2seq - Sequence to Sequence Learning with Keras

Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks.

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations