Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Overview

DIGAN (ICLR 2022)

Official PyTorch implementation of "Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks" by Sihyun Yu*, Jihoon Tack*, Sangwoo Mo*, Hyunsu Kim, Junho Kim, Jung-Woo Ha, Jinwoo Shin.

TL;DR: We make video generation scalable leveraging implicit neural representations.

Illustration of the (a) generator and (b) discriminator of DIGAN. The generator creates a video INR weight from random content and motion vectors, which produces an image that corresponds to the input 2D grids {(x, y)} and time t. Two discriminators determine the reality of each image and motion (from a pair of images and their time difference), respectively.

1. Environment setup

conda create -n digan python=3.8
conda activate digan

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

pip install hydra-core==1.0.6
pip install tqdm scipy scikit-learn av ninja
pip install click gitpython requests psutil einops tensorboardX

2. Dataset

One should organize the video dataset as follows:

UCF-101

UCF-101
|-- train
    |-- class1
        |-- video1.avi
        |-- video2.avi
        |-- ...
    |-- class2
        |-- video1.avi
        |-- video2.avi
        |-- ...
    |-- ...

Other video datasets (Sky Time lapse, TaiChi-HD, Kinetics-food)

Video dataset
|-- train
    |-- video1
        |-- frame00000.png
        |-- frame00001.png
        |-- ...
    |-- video2
        |-- frame00000.png
        |-- frame00001.png
        |-- ...
    |-- ...
|-- val
    |-- video1
        |-- frame00000.png
        |-- frame00001.png
        |-- ...
    |-- ...

Dataset download

3. Training

To train the model, navigate to the project directory and run:

python src/infra/launch.py hydra.run.dir=. +experiment_name=<EXP_NAME> +dataset.name=<DATASET>

You may change training options via modifying configs/main.yml and configs/digan.yml.
Also the dataset list is as follows, <DATASET>: {UCF-101,sky,taichi,kinetics}

4. Evaluation (FVD and KVD)

python src/scripts/compute_fvd_kvd.py --network_pkl <MODEL_PATH> --data_path <DATA_PATH>

5. Video generation

Genrate and visualize videos (as gif and mp4):

python src/scripts/generate_videos.py --network_pkl <MODEL_PATH> --outdir <OUTPUT_PATH>

6. Results

Generated video results of DIGAN on TaiChi (top) and Sky (bottom) datasets.
More generated video results are available at the following site.

Citation

@inproceedings{
    yu2022generating,
    title={Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks},
    author={Yu, Sihyun and Tack, Jihoon and Mo, Sangwoo and Kim, Hyunsu and Kim, Junho and Ha, Jung-Woo and Shin, Jinwoo},
    booktitle={International Conference on Learning Representations},
    year={2022},
}

Reference

This code is mainly built upon StyleGAN2-ada and INR-GAN repositories.
We also used the code from following repositories: DiffAug, VideoGPT, MDGAN

Lisence

Copyright 2022-present NAVER Corp.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Owner
Sihyun Yu
Ph.D. student at ALINLAB @ KAIST
Sihyun Yu
Bridging Composite and Real: Towards End-to-end Deep Image Matting

Bridging Composite and Real: Towards End-to-end Deep Image Matting Please note that the official repository of the paper Bridging Composite and Real:

Jizhizi_Li 30 Oct 31, 2022
《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

114 Dec 10, 2022
Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Trading Gym Trading Gym is an open-source project for the development of reinforcement learning algorithms in the context of trading. It is currently

Dimitry Foures 535 Nov 15, 2022
PyVideoAI: Action Recognition Framework

This reposity contains official implementation of: Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognitio

Kiyoon Kim 22 Dec 29, 2022
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Spchcat Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Description spchcat is a command-line tool that read

Pete Warden 279 Jan 03, 2023
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

WangWen 79 Dec 24, 2022
[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

UnrealPerson: An Adaptive Pipeline for Costless Person Re-identification In our paper (arxiv), we propose a novel pipeline, UnrealPerson, that decreas

ZhangTianyu 70 Oct 10, 2022
Conformer: Local Features Coupling Global Representations for Visual Recognition

Conformer: Local Features Coupling Global Representations for Visual Recognition (arxiv) This repository is built upon DeiT and timm Usage First, inst

Zhiliang Peng 378 Jan 08, 2023
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
An end-to-end project on customer segmentation

End-to-end Customer Segmentation Project Note: This project is in progress. Tools Used in This Project Prefect: Orchestrate workflows hydra: Manage co

Ocelot Consulting 8 Oct 06, 2022
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
A self-supervised learning framework for audio-visual speech

AV-HuBERT (Audio-Visual Hidden Unit BERT) Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Robust Self-Supervised A

Meta Research 431 Jan 07, 2023
Segmentation-Aware Convolutional Networks Using Local Attention Masks

Segmentation-Aware Convolutional Networks Using Local Attention Masks [Project Page] [Paper] Segmentation-aware convolution filters are invariant to b

144 Jun 29, 2022
A curated list and survey of awesome Vision Transformers.

English | 简体中文 A curated list and survey of awesome Vision Transformers. You can use mind mapping software to open the mind mapping source file. You c

OpenMMLab 281 Dec 21, 2022
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
Synthetic Scene Text from 3D Engines

Introduction UnrealText is a project that synthesizes scene text images using 3D graphics engine. This repository accompanies our paper: UnrealText: S

Shangbang Long 215 Dec 29, 2022
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data

Salah Eddine LABIAD 28 Jan 04, 2023
PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space" [ICCV 2021].

Christos Tzelepis 100 Dec 06, 2022
Affine / perspective transformation in Pose Estimation with Tensorflow 2

Pose Transformation Affine / Perspective transformation in Pose Estimation with Tensorflow 2 Introduction 이 repo는 pose estimation을 연구하고 개발하는 데 도움이 되기

Kim Junho 1 Dec 22, 2021