Open source code for the paper of Neural Sparse Voxel Fields.

Related tags

Deep LearningNSVF
Overview

Neural Sparse Voxel Fields (NSVF)

Project Page | Video | Paper | Data

Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is a challenging problem because it requires the difficult step of capturing detailed appearance and geometry models. Neural rendering is an emerging field that employs deep neural networks to implicitly learn scene representations encapsulating both geometry and appearance from 2D observations with or without a coarse geometry. However, existing approaches in this field often show blurry renderings or suffer from slow rendering process. We propose Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering.

Here is the official repo for the paper:

We also provide our unofficial implementation for:

Table of contents



Requirements and Installation

This code is implemented in PyTorch using fairseq framework.

The code has been tested on the following system:

  • Python 3.7
  • PyTorch 1.4.0
  • Nvidia apex library (optional)
  • Nvidia GPU (Tesla V100 32GB) CUDA 10.1

Only learning and rendering on GPUs are supported.

To install, first clone this repo and install all dependencies:

pip install -r requirements.txt

Then, run

pip install --editable ./

Or if you want to install the code locally, run:

python setup.py build_ext --inplace

Dataset

You can download the pre-processed synthetic and real datasets used in our paper. Please also cite the original papers if you use any of them in your work.

Dataset Download Link Notes on Dataset Split
Synthetic-NSVF download (.zip) 0_* (training) 1_* (validation) 2_* (testing)
Synthetic-NeRF download (.zip) 0_* (training) 1_* (validation) 2_* (testing)
BlendedMVS download (.zip) 0_* (training) 1_* (testing)
Tanks&Temples download (.zip) 0_* (training) 1_* (testing)

Prepare your own dataset

To prepare a new dataset of a single scene for training and testing, please follow the data structure:

<dataset_name>
|-- bbox.txt         # bounding-box file
|-- intrinsics.txt   # 4x4 camera intrinsics
|-- rgb
    |-- 0.png        # target image for each view
    |-- 1.png
    ...
|-- pose
    |-- 0.txt        # camera pose for each view (4x4 matrices)
    |-- 1.txt
    ...
[optional]
|-- test_traj.txt    # camera pose for free-view rendering demonstration (4N x 4)

where the bbox.txt file contains a line describing the initial bounding box and voxel size:

x_min y_min z_min x_max y_max z_max initial_voxel_size

Note that the file names of target images and those of the corresponding camera pose files are not required to be exactly the same. However, the orders of these two kinds of files (sorted by string) must match. The datasets are split with view indices. For example, "train (0..100), valid (100..200) and test (200..400)" mean the first 100 views for training, 100-199th views for validation, and 200-399th views for testing.

Train a new model

Given the dataset of a single scene ({DATASET}), we use the following command for training an NSVF model to synthesize novel views at 800x800 pixels, with a batch size of 4 images per GPU and 2048 rays per image. By default, the code will automatically detect all available GPUs.

In the following example, we use a pre-defined architecture nsvf_base with specific arguments:

  • By setting --no-sampling-at-reader, the model only samples pixels in the projected image region of sparse voxels for training.
  • By default, we set the ray-marching step size to be the ratio 1/8 (0.125) of the voxel size which is typically described in the bbox.txt file.
  • It is optional to turn on --use-octree. It will build a sparse voxel octree to speed-up the ray-voxel intersection especially when the number of voxels is larger than 10000.
  • By setting --pruning-every-steps as 2500, the model performs self-pruning at every 2500 steps.
  • By setting --half-voxel-size-at and --reduce-step-size-at as 5000,25000,75000, the voxel size and step size are halved at 5k, 25k and 75k, respectively.

Note that, although above parameter settings are used for most of the experiments in the paper, it is possible to tune these parameters to achieve better quality. Besides the above parameters, other parameters can also use default settings.

Besides the architecture nsvf_base, you may check other architectures or define your own architectures in the file fairnr/models/nsvf.py.

python -u train.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --train-views "0..100" --view-resolution "800x800" \
    --max-sentences 1 --view-per-batch 4 --pixel-per-view 2048 \
    --no-preload \
    --sampling-on-mask 1.0 --no-sampling-at-reader \
    --valid-views "100..200" --valid-view-resolution "400x400" \
    --valid-view-per-batch 1 \
    --transparent-background "1.0,1.0,1.0" --background-stop-gradient \
    --arch nsvf_base \
    --initial-boundingbox ${DATASET}/bbox.txt \
    --use-octree \
    --raymarching-stepsize-ratio 0.125 \
    --discrete-regularization \
    --color-weight 128.0 --alpha-weight 1.0 \
    --optimizer "adam" --adam-betas "(0.9, 0.999)" \
    --lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000 \
    --criterion "srn_loss" --clip-norm 0.0 \
    --num-workers 0 \
    --seed 2 \
    --save-interval-updates 500 --max-update 150000 \
    --virtual-epoch-steps 5000 --save-interval 1 \
    --half-voxel-size-at  "5000,25000,75000" \
    --reduce-step-size-at "5000,25000,75000" \
    --pruning-every-steps 2500 \
    --keep-interval-updates 5 --keep-last-epochs 5 \
    --log-format simple --log-interval 1 \
    --save-dir ${SAVE} \
    --tensorboard-logdir ${SAVE}/tensorboard \
    | tee -a $SAVE/train.log

The checkpoints are saved in {SAVE}. You can launch tensorboard to check training progress:

tensorboard --logdir=${SAVE}/tensorboard --port=10000

There are more examples of training scripts to reproduce the results of our paper under examples.

Evaluation

Once the model is trained, the following command is used to evaluate rendering quality on the test views given the {MODEL_PATH}.

python validate.py ${DATASET} \
    --user-dir fairnr \
    --valid-views "200..400" \
    --valid-view-resolution "800x800" \
    --no-preload \
    --task single_object_rendering \
    --max-sentences 1 \
    --valid-view-per-batch 1 \
    --path ${MODEL_PATH} \
    --model-overrides '{"chunk_size":512,"raymarching_tolerance":0.01,"tensorboard_logdir":"","eval_lpips":True}' \

Note that we override the raymarching_tolerance to 0.01 to enable early termination for rendering speed-up.

Free Viewpoint Rendering

Free-viewpoint rendering can be achieved once a model is trained and a rendering trajectory is specified. For example, the following command is for rendering with a circle trajectory (angular speed 3 degree/frame, 15 frames per GPU). This outputs per-view rendered images and merge the images into a .mp4 video in ${SAVE}/output as follows:

By default, the code can detect all available GPUs.

python render.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --path ${MODEL_PATH} \
    --model-overrides '{"chunk_size":512,"raymarching_tolerance":0.01}' \
    --render-beam 1 --render-angular-speed 3 --render-num-frames 15 \
    --render-save-fps 24 \
    --render-resolution "800x800" \
    --render-path-style "circle" \
    --render-path-args "{'radius': 3, 'h': 2, 'axis': 'z', 't0': -2, 'r':-1}" \
    --render-output ${SAVE}/output \
    --render-output-types "color" "depth" "voxel" "normal" --render-combine-output \
    --log-format "simple"

Our code also supports rendering for given camera poses. For instance, the following command is for rendering with the camera poses defined in the 200-399th files under folder ${DATASET}/pose:

python render.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --path ${MODEL_PATH} \
    --model-overrides '{"chunk_size":512,"raymarching_tolerance":0.01}' \
    --render-save-fps 24 \
    --render-resolution "800x800" \
    --render-camera-poses ${DATASET}/pose \
    --render-views "200..400" \
    --render-output ${SAVE}/output \
    --render-output-types "color" "depth" "voxel" "normal" --render-combine-output \
    --log-format "simple"

The code also supports rendering with camera poses defined in a .txt file. Please refer to this example.

Extract the Geometry

We also support running marching cubes to extract the iso-surfaces as triangle meshes from a trained NSVF model and saved as {SAVE}/{NAME}.ply.

python extract.py \
    --user-dir fairnr \
    --path ${MODEL_PATH} \
    --output ${SAVE} \
    --name ${NAME} \
    --format 'mc_mesh' \
    --mc-threshold 0.5 \
    --mc-num-samples-per-halfvoxel 5

It is also possible to export the learned sparse voxels by setting --format 'voxel_mesh'. The output .ply file can be opened with any 3D viewers such as MeshLab.

License

NSVF is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as

@article{liu2020neural,
  title={Neural Sparse Voxel Fields},
  author={Liu, Lingjie and Gu, Jiatao and Lin, Kyaw Zaw and Chua, Tat-Seng and Theobalt, Christian},
  journal={NeurIPS},
  year={2020}
}
Owner
Meta Research
Meta Research
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
PAWS đŸŸ Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Pre

Facebook Research 437 Dec 23, 2022
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Check the unlearning effect

Yige-Li 51 Dec 07, 2022
Reinfore learning tool box, contains trpo, a3c algorithm for continous action space

RL_toolbox all the algorithm is running on pycharm IDE, or the package loss error may exist. implemented algorithm: trpo a3c a3c:for continous action

yupei.wu 44 Oct 10, 2022
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

BAM and CBAM Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)" Updat

Jongchan Park 1.7k Jan 01, 2023
Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

coax is built on top of JAX, but it doesn't have an explicit dependence on the jax python package. The reason is that your version of jaxlib will depend on your CUDA version.

128 Dec 27, 2022
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Rishikesh (à€‹à€·à€żà€•à„‡à€¶) 69 Dec 17, 2022
On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

PTvsBT On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021) Citation Please cite a

Sunbow Liu 10 Nov 25, 2022
Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022
Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. đŸ”„

ElegantRL “氏雅”: Scalable and Elastic Deep Reinforcement Learning ElegantRL is developed for researchers and practitioners with the following advantage

AI4Finance Foundation 2.5k Jan 05, 2023
ZeroVL - The official implementation of ZeroVL

This repository contains source code necessary to reproduce the results presente

31 Nov 04, 2022
A novel Engagement Detection with Multi-Task Training (ED-MTT) system

A novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes MSE and triplet loss together to determine the engagement level of students in an e-learning environment.

Onur Çopur 12 Nov 11, 2022
Code and data (Incidents Dataset) for ECCV 2020 Paper "Detecting natural disasters, damage, and incidents in the wild".

Incidents Dataset See the following pages for more details: Project page: IncidentsDataset.csail.mit.edu. ECCV 2020 Paper "Detecting natural disasters

Ethan Weber 67 Dec 27, 2022
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

LST-TTS Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audi

Li-Wei Chen 64 Dec 30, 2022
An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

Deep-motion-editing This library provides fundamental and advanced functions to work with 3D character animation in deep learning with Pytorch. The co

1.2k Dec 29, 2022
TUPÃ was developed to analyze electric field properties in molecular simulations

TUPÃ: Electric field analyses for molecular simulations What is TUPÃ? TUPÃ (pronounced as tu-pan) is a python algorithm that employs MDAnalysis engine

Marcelo D. PolĂȘto 10 Jul 17, 2022
Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

Human-Level Control through Deep Reinforcement Learning Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning. This imp

Devsisters Corp. 2.4k Dec 26, 2022