Layered Neural Atlases for Consistent Video Editing

Overview

Layered Neural Atlases for Consistent Video Editing

Project Page | Paper

This repository contains an implementation for the SIGGRAPH Asia 2021 paper Layered Neural Atlases for Consistent Video Editing.

The paper introduces the first approach for neural video unwrapping using an end-to-end optimized interpretable and semantic atlas-based representation, which facilitates easy and intuitive editing in the atlas domain.

Installation Requirements

The code is compatible with Python 3.7 and PyTorch 1.6.

You can create an anaconda environment called neural_atlases with the required dependencies by running:

conda create --name neural_atlases python=3.7 
conda activate neural_atlases 
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy  scikit-image tqdm  opencv -c pytorch
pip install imageio-ffmpeg gdown
python -m pip install detectron2 -f   https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

Data convention

The code expects 3 folders for each video input, e.g. for a video of 50 frames named "blackswan":

  1. data/blackswan: A folder of video frames containing image files in the following convention: blackswan/00000.jpg,blackswan/00001.jpg,...,blackswan/00049.jpg (as in the DAVIS dataset).
  2. data/blackswan_flow: A folder with forward and backward optical flow files in the following convention: blackswan_flow/00000.jpg_00001.jpg.npy,blackswan_flow/00001.jpg_00000.jpg,...,blackswan_flow/00049.jpg_00048.jpg.npy.
  3. data/blackswan_maskrcnn: A folder with rough masks (created by Mask-RCNN or any other way) containing files in the following convention: blackswan_maskrcnn/00000.jpg,blackswan_maskrcnn/00001.jpg,...,blackswan_maskrcnn/00049.jpg

For a few examples of DAVIS sequences run:

gdown https://drive.google.com/uc?id=1WipZR9LaANTNJh764ukznXXAANJ5TChe
unzip data.zip

Masks extraction

Given only the video frames folder data/blackswan it is possible to extract the Mask-RCNN masks (and create the required folder data/blackswan_maskrcnn) by running:

python preprocess_mask_rcnn.py --vid-path data/blackswan --class_name bird

where --class_name determines the COCO class name of the sought foreground object. It is also possible to choose the first instance retrieved by Mask-RCNN by using --class_name anything. This is usefull for cases where Mask-RCNN gets correct masks with wrong classes as in the "libby" video:

python preprocess_mask_rcnn.py --vid-path data/libby --class_name anything

Optical flows extraction

Furthermore, the optical flow folder can be extracted using RAFT. For linking RAFT into the current project run:

git submodule update --init
cd thirdparty/RAFT/
./download_models.sh
cd ../..

For extracting the optical flows (and creating the required folder data/blackswan_flow) run:

python preprocess_optical_flow.py --vid-path data/blackswan --max_long_edge 768

Pretrained models

For downloading a sample set of our pretrained models together with sample edits run:

gdown https://drive.google.com/uc?id=10voSCdMGM5HTIYfT0bPW029W9y6Xij4D
unzip pretrained_models.zip

Training

For training a model on a video, run:

python train.py config/config.json

where the video frames folder is determined by the config parameter "data_folder". Note that in order to reduce the training time it is possible to reduce the evaluation frequency controlled by the parameter "evaluate_every" (e.g. by changing it to 10000). The other configurable parameters are documented inside the file train.py.

Evaluation

During training, the model is evaluated. For running only evaluation on a trained folder run:

python only_evaluate.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=evaluation_outputs

where trained_model_folder is the path to a folder that contains the config.json and checkpoint files of the trained model.

Editing

To apply editing, run the script only_edit.py. Examples for the supplied pretrained models for "blackswan" and "boat":

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_foreground.png --edit_background_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_background.png
python only_edit.py --trained_model_folder=pretrained_models/checkpoints/boat --video_name=boat --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/boat/edit_boat_foreground.png --edit_background_path=pretrained_models/edit_inputs/boat/edit_boat_backgound.png

Where edit_foreground_path and edit_background_path specify the paths to 1000x1000 images of the RGBA atlas edits.

For applying an edit that was done on a frame (e.g. for the pretrained "libby"):

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=data --output_folder=editing_outputs  --use_edit_frame --edit_frame_index=7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

Citation

If you find our work useful in your research, please consider citing:

@article{kasten2021layered,
  title={Layered Neural Atlases for Consistent Video Editing},
  author={Kasten, Yoni and Ofri, Dolev and Wang, Oliver and Dekel, Tali},
  journal={arXiv preprint arXiv:2109.11418},
  year={2021}
}
Owner
Yoni Kasten
Yoni Kasten
RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation RL-GAN is an official implementation of the paper: T

42 Nov 10, 2022
PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Shape-aware Convolutional Layer (ShapeConv) PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentatio

Hanchao Leng 82 Dec 29, 2022
Repo for Photon-Starved Scene Inference using Single Photon Cameras, ICCV 2021

Photon-Starved Scene Inference using Single Photon Cameras ICCV 2021 Arxiv Project Video Bhavya Goyal, Mohit Gupta University of Wisconsin-Madison Abs

Bhavya Goyal 5 Nov 15, 2022
GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

Valentin Goldité 14 Nov 29, 2022
Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21 For more information, check out the paper on [arXiv]. Training with different

Sunghwan Hong 120 Jan 04, 2023
Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Neuron Merging: Compensating for Pruned Neurons Pytorch implementation of Neuron Merging: Compensating for Pruned Neurons, accepted at 34th Conference

Woojeong Kim 33 Dec 30, 2022
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email:

wasteland 11 Nov 12, 2022
An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

fasterGICP This package is an improvement of fast_gicp Please cite our paper if possible. W. Jikai, M. Xu, F. Farzin, D. Dai and Z. Chen, "FasterGICP:

79 Dec 31, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
CVPR2021 Content-Aware GAN Compression

Content-Aware GAN Compression [ArXiv] Paper accepted to CVPR2021. @inproceedings{liu2021content, title = {Content-Aware GAN Compression}, auth

52 Nov 06, 2022
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling" Pipeline of Tip-Adapter Tip-Adapter can provid

peng gao 187 Dec 28, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

ConversorDeMedidas_CapuccinoGelado Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini! Requirem

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Spatial unbiased GANs — Simple TensorFlow Implementation [Paper] : Toward Spatially Unbiased Generative Models (ICCV 2021) Abstract Recent image gener

Junho Kim 16 Apr 15, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 70 Dec 04, 2022
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

ANEA The goal of Automatic (Named) Entity Annotation is to create a small annotated dataset for NER extracted from German domain-specific texts. Insta

Anastasia Zhukova 2 Oct 07, 2022
Realtime_Multi-Person_Pose_Estimation

Introduction Multi Person PoseEstimation By PyTorch Results Require Pytorch Installation git submodule init && git submodule update Demo Download conv

tensorboy 1.3k Jan 05, 2023