Volumetric Correspondence Networks for Optical Flow, NeurIPS 2019.

Overview

VCN: Volumetric correspondence networks for optical flow

[project website]

Requirements

Pre-trained models

To test on any two images

Running visualize.ipynb gives you the following flow visualizations with color and vectors. Note: the sintel model "./weights/sintel-ft-trainval/finetune_67999.tar" is trained on multiple datasets and generalizes better than the KITTI model.

KITTI

This correspondens to the entry on the leaderboard (Fl-all=6.30%).

Evaluate on KITTI-15 benchmark

To run + visualize on KITTI-15 test set,

modelname=kitti-ft-trainval
i=149999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset 2015test --datapath dataset/kitti_scene/testing/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 512 --fac 2
python eval_tmp.py --path ./weights/$modelname/ --vis yes --dataset 2015test
Evaluate on KITTI-val

To see the details of the train-val split, please scroll down to "note on train-val" and run dataloader/kitti15list_val.py, dataloader/kitti15list_train.py, dataloader/sitnellist_train.py, and dataloader/sintellist_val.py.

To evaluate on the 40 validation images of KITTI-15 (0,5,...195), (also assuming the data is at /ssd/kitti_scene)

modelname=kitti-ft-trainval
i=149999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset 2015 --datapath /ssd/kitti_scene/training/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 512 --fac 2
python eval_tmp.py --path ./weights/$modelname/ --vis no --dataset 2015

To evaluate + visualize on KITTI-15 validation set,

python eval_tmp.py --path ./weights/$modelname/ --vis yes --dataset 2015

Evaluation error on 40 validation images : Fl-err = 3.9, EPE = 1.144

Sintel

This correspondens to the entry on the leaderboard (EPE-all-final = 4.404, EPE-all-clean = 2.808).

Evaluate on Sintel-val

To evaluate on Sintel validation set,

modelname=sintel-ft-trainval
i=67999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset sintel --datapath /ssd/rob_flow/training/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 448 --fac 1.4
python eval_tmp.py --path ./weights/$modelname/ --vis no --dataset sintel

Evaluation error on sintel validation images: Fl-err = 7.9, EPE = 2.351

Train the model

We follow the same stage-wise training procedure as prior work: Chairs->Things->KITTI or Chairs->Things->Sintel, but uses much lesser iterations. If you plan to train the model and reproduce the numbers, please check out our supplementary material for the differences in hyper-parameters with FlowNet2 and PWCNet.

Pretrain on flying chairs and flying things

Make sure you have downloaded flying chairs and flying things subset, and placed them under the same folder, say /ssd/.

To first train on flying chairs for 140k iterations with a batchsize of 8, run (assuming you have two gpus)

CUDA_VISIBLE_DEVICES=0,1 python main.py --maxdisp 256 --fac 1 --database /ssd/ --logname chairs-0 --savemodel /data/ptmodel/  --epochs 1000 --stage chairs --ngpus 2

Then we want to fine-tune on flying things for 80k iterations with a batchsize of 8, resume from your pre-trained model or use our pretrained model

CUDA_VISIBLE_DEVICES=0,1 python main.py --maxdisp 256 --fac 1 --database /ssd/ --logname things-0 --savemodel /data/ptmodel/  --epochs 1000 --stage things --ngpus 2 --loadmodel ./weights/charis/finetune_141999.tar --retrain false

Note that to resume the number of iterations, put the iteration to start from in iter_counts-(your suffix).txt. In this example, I'll put 141999 in iter_counts-0.txt. Be aware that the program reads/writes to iter_counts-(suffix).txt at training time, so you may want to use different suffix when multiple training programs are running at the same time.

Finetune on KITTI / Sintel

Please first download the kitti 2012/2015 flow dataset if you want to fine-tune on kitti. Download rob_devkit if you want to fine-tune on sintel.

To fine-tune on KITTI with a batchsize of 16, run

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --maxdisp 512 --fac 2 --database /ssd/ --logname kitti-trainval-0 --savemodel /data/ptmodel/  --epochs 1000 --stage 2015trainval --ngpus 4 --loadmodel ./weights/things/finetune_211999.tar --retrain true

To fine-tune on Sintel with a batchsize of 16, run

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --maxdisp 448 --fac 1.4 --database /ssd/ --logname sintel-trainval-0 --savemodel /data/ptmodel/  --epochs 1000 --stage sinteltrainval --ngpus 4 --loadmodel ./weights/things/finetune_239999.tar --retrain true

Note on train-val

  • To tune hyper-parameters, we use a train-val split for kitti and sintel, which is not covered by the above procedure.
  • For kitti we use every 5th image in the training set (0,5,10,...195) for validation, and the rest for training; while for Sintel, we manually select several sequences for validation.
  • If you plan to use our split, put "--stage 2015train" or "--stage sinteltrain" for training.
  • The numbers in Tab.3 of the paper is on the whole train-val set (all the data with ground-truth).
  • You might find run.sh helpful to run evaluation on KITTI/Sintel.

Measure FLOPS

python flops.py

gives

PWCNet: flops(G)/params(M):90.8/9.37

VCN: flops(G)/params(M):96.5/6.23

Note on inference time

The current implementation runs at 180ms/pair on KITTI-sized images at inference time. A rough breakdown of running time is: feature extraction - 4.9%, feature correlation - 8.7%, separable 4D convolutions - 56%, trun. soft-argmin (soft winner-take-all) - 20% and hypotheses fusion - 9.5%. A detailed breakdown is shown below in the form "name-level percentage".

Note that separable 4D convolutions use less FLOPS than 2D convolutions (i.e., feature extraction module + hypotheses fusion module, 47.8 v.s. 53.3 Gflops) but take 4X more time (56% v.s. 14.4%). One reason might be that pytorch (also other packages) is more friendly to networks with more feature channels than those with large spatial size given the same Flops. This might be fixed at the conv kernel / hardware level.

Besides, the truncated soft-argmin is implemented with 3D max pooling, which is inefficient and takes more time than expected.

Acknowledgement

Thanks ClementPinard, Lyken17, NVlabs and many others for open-sourcing their code.

Citation

@inproceedings{yang2019vcn,
  title={Volumetric Correspondence Networks for Optical Flow},
  author={Yang, Gengshan and Ramanan, Deva},
  booktitle={NeurIPS},
  year={2019}
}
Multi-Stage Episodic Control for Strategic Exploration in Text Games

XTX: eXploit - Then - eXplore Requirements First clone this repo using git clone https://github.com/princeton-nlp/XTX.git Please create two conda envi

Princeton Natural Language Processing 9 May 24, 2022
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

程星 87 Dec 24, 2022
A library that can print Python objects in human readable format

objprint A library that can print Python objects in human readable format Install pip install objprint Usage op Use op() (or objprint()) to print obj

319 Dec 25, 2022
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

15 Dec 04, 2022
A simple code to perform canny edge contrast detection on images.

CECED-Canny-Edge-Contrast-Enhanced-Detection A simple code to perform canny edge contrast detection on images. A simple code to process images using c

Happy N. Monday 3 Feb 15, 2022
Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

ademxapp Visual applications by the University of Adelaide In designing our Model A, we did not over-optimize its structure for efficiency unless it w

Zifeng Wu 338 Dec 12, 2022
A fast Evolution Strategy implementation in Python

Evostra: Evolution Strategy for Python Evolution Strategy (ES) is an optimization technique based on ideas of adaptation and evolution. You can learn

Mika 251 Dec 08, 2022
Keras-1D-ACGAN-Data-Augmentation

Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti

Jae-Hoon Shim 7 Dec 23, 2022
Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Philipp Erler 329 Jan 06, 2023
Ensembling Off-the-shelf Models for GAN Training

Vision-aided GAN video (3m) | website | paper Can the collective knowledge from a large bank of pretrained vision models be leveraged to improve GAN t

345 Dec 28, 2022
GANfolk: Using AI to create portraits of fictional people to sell as NFTs

GANfolk are AI-generated renderings of fictional people. Each image in the collection was created by a pair of Generative Adversarial Networks (GANs) with names and backstories also created with AI.

Robert A. Gonsalves 32 Dec 02, 2022
Semi-Supervised Learning, Object Detection, ICCV2021

End-to-End Semi-Supervised Object Detection with Soft Teacher By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai,

Microsoft 789 Dec 27, 2022
Benchmark tools for Compressive LiDAR-to-map registration

Benchmark tools for Compressive LiDAR-to-map registration This repo contains the released version of code and datasets used for our IROS 2021 paper: "

Allie 9 Nov 24, 2022
Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking Part-Aware Measurement for Robust Multi-View Multi-Human 3D P

19 Oct 27, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
Official Python implementation of the FuzionCoin protocol

PyFuzc Official Python implementation of the FuzionCoin protocol WARNING: Under construction. Use at your own risk. Some functions may not work. Setup

FuzionCoin 3 Jul 07, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
Easy and Efficient Object Detector

EOD Easy and Efficient Object Detector EOD (Easy and Efficient Object Detection) is a general object detection model production framework. It aim on p

381 Jan 01, 2023