This repository contains the source code for the paper First Order Motion Model for Image Animation

Overview

!!! Check out our new paper and framework improved for articulated objects

First Order Motion Model for Image Animation

This repository contains the source code for the paper First Order Motion Model for Image Animation by Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci and Nicu Sebe.

Example animations

The videos on the left show the driving videos. The first row on the right for each dataset shows the source videos. The bottom row contains the animated sequences with motion transferred from the driving video and object taken from the source image. We trained a separate network for each task.

VoxCeleb Dataset

Screenshot

Fashion Dataset

Screenshot

MGIF Dataset

Screenshot

Installation

We support python3. To install the dependencies run:

pip install -r requirements.txt

YAML configs

There are several configuration (config/dataset_name.yaml) files one for each dataset. See config/taichi-256.yaml to get description of each parameter.

Pre-trained checkpoint

Checkpoints can be found under following link: google-drive or yandex-disk.

Animation Demo

To run a demo, download checkpoint and run the following command:

python demo.py  --config config/dataset_name.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale

The result will be stored in result.mp4.

The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg. In order to use the script, face-alligment library is needed:

git clone https://github.com/1adrianb/face-alignment
cd face-alignment
pip install -r requirements.txt
python setup.py install

Animation demo with Docker

If you are having trouble getting the demo to work because of library compatibility issues, and you're running Linux, you might try running it inside a Docker container, which would give you better control over the execution environment.

Requirements: Docker 19.03+ and nvidia-docker installed and able to successfully run the nvidia-docker usage tests.

We'll first build the container.

docker build -t first-order-model .

And now that we have the container available locally, we can use it to run the demo.

docker run -it --rm --gpus all \
       -v $HOME/first-order-model:/app first-order-model \
       python3 demo.py --config config/vox-256.yaml \
           --driving_video driving.mp4 \
           --source_image source.png  \ 
           --checkpoint vox-cpk.pth.tar \ 
           --result_video result.mp4 \
           --relative --adapt_scale

Colab Demo

@graphemecluster prepared a gui-demo for the google-colab see: demo.ipynb. To run press Open In Colab button.

For old demo, see old-demo.ipynb.

Face-swap

It is possible to modify the method to perform face-swap using supervised segmentation masks. Screenshot For both unsupervised and supervised video editing, such as face-swap, please refer to Motion Co-Segmentation.

Training

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py --config config/dataset_name.yaml --device_ids 0,1,2,3

The code will create a folder in the log directory (each run will create a time-stamped new directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. You can also check training data reconstructions in the train-vis subfolder. By default the batch size is tunned to run on 2 or 4 Titan-X gpu (appart from speed it does not make much difference). You can change the batch size in the train_params in corresponding .yaml file.

Evaluation on video reconstruction

To evaluate the reconstruction performance run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode reconstruction --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the reconstruction subfolder will be created in the checkpoint folder. The generated video will be stored to this folder, also generated videos will be stored in png subfolder in loss-less '.png' format for evaluation. Instructions for computing metrics from the paper can be found: https://github.com/AliaksandrSiarohin/pose-evaluation.

Image animation

In order to animate videos run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode animate --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the animation subfolder will be created in the same folder as the checkpoint. You can find the generated video there and its loss-less version in the png subfolder. By default video from test set will be randomly paired, but you can specify the "source,driving" pairs in the corresponding .csv files. The path to this file should be specified in corresponding .yaml file in pairs_list setting.

There are 2 different ways of performing animation: by using absolute keypoint locations or by using relative keypoint locations.

  1. Animation using absolute coordinates: the animation is performed using the absolute postions of the driving video and appearance of the source image. In this way there are no specific requirements for the driving video and source appearance that is used. However this usually leads to poor performance since unrelevant details such as shape is transfered. Check animate parameters in taichi-256.yaml to enable this mode.

  1. Animation using relative coordinates: from the driving video we first estimate the relative movement of each keypoint, then we add this movement to the absolute position of keypoints in the source image. This keypoint along with source image is used for animation. This usually leads to better performance, however this requires that the object in the first frame of the video and in the source image have the same pose

Datasets

  1. Bair. This dataset can be directly downloaded.

  2. Mgif. This dataset can be directly downloaded.

  3. Fashion. Follow the instruction on dataset downloading from.

  4. Taichi. Follow the instructions in data/taichi-loading or instructions from https://github.com/AliaksandrSiarohin/video-preprocessing.

  5. Nemo. Please follow the instructions on how to download the dataset. Then the dataset should be preprocessed using scripts from https://github.com/AliaksandrSiarohin/video-preprocessing.

  6. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

Additional notes

Citation:

@InProceedings{Siarohin_2019_NeurIPS,
  author={Siarohin, Aliaksandr and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  title={First Order Motion Model for Image Animation},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  month = {December},
  year = {2019}
}
Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

Dahyun Kang 4 May 28, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

Miles Cranmer 38 Dec 11, 2022
Lightweight Cuda Renderer with Python Wrapper.

pyRender Lightweight Cuda Renderer with Python Wrapper. Compile Change compile.sh line 5 to the glm library include path. This library can be download

Jingwei Huang 53 Dec 02, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

This repository is no longer maintained. Please use our new Softlearning package instead. Soft Actor-Critic Soft actor-critic is a deep reinforcement

Tuomas Haarnoja 752 Jan 07, 2023
TyXe: Pyro-based BNNs for Pytorch users

TyXe: Pyro-based BNNs for Pytorch users TyXe aims to simplify the process of turning Pytorch neural networks into Bayesian neural networks by leveragi

87 Jan 03, 2023
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN This is an unofficial implementation of SinGAN from someone who's been sitting right next to SinGAN's creator for almost five years. Please ref

35 Nov 10, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 03, 2023
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
code for generating data set ES-ImageNet with corresponding training code

es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl

Ordinarabbit 18 Dec 25, 2022
Python scripts for performing stereo depth estimation using the MobileStereoNet model in ONNX

ONNX-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in ONNX Stereo depth estimation on the cone

Ibai Gorordo 23 Nov 29, 2022
This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

FlatTN This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transfor

THUHCSI 74 Nov 28, 2022
[ICCV '21] In this repository you find the code to our paper Keypoint Communities

Keypoint Communities In this repository you will find the code to our ICCV '21 paper: Keypoint Communities Duncan Zauss, Sven Kreiss, Alexandre Alahi,

Duncan Zauss 262 Dec 13, 2022
A Python library created to assist programmers with complex mathematical functions

libmaths libmaths was created not only as a learning experience for me, but as a way to make mathematical models in seconds for Python users using mat

Simple 73 Oct 02, 2022
We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Crack Any Password Using Python We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will

Ananya Chatterjee 11 Dec 03, 2022
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Official TensorFlow implementation of the unsupervised reconstruction model using zero-Shot Learned Adversarial TransformERs (SLATER). (https://arxiv.

ICON Lab 22 Dec 22, 2022
A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Phase-SLAM A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor This open source is written by MATLAB Run Mode Open

Xi Zheng 14 Dec 19, 2022