Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Last update: Jan 03, 2023

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Python >= 3.6 , Pytorch >= 1.8 and ffmpeg
Set up OpenFace
- We use the OpenFace tools to extract the initial pose of the reference image
- Make sure you have installed this tool, and set the OPENFACE_POSE_EXTRACTOR_PATH in config.py. For example, it should be the absolute path of the "FeatureExtraction.exe" for Windows.
Other requirements are listed in the 'requirements.txt'

Pretrained Checkpoint

Please download the pretrained checkpoint from google-drive and unzip it to the directory (/checkpoints). Or manually modify the settings of GENERATOR_CKPT and AUDIO2POSE_CKPT in the config.py.

Extract phoneme

We employ the CMU phoneset to represent phonemes, the extra 'SIL' means silence. All the phonesets can be seen in 'phindex.json'.

We have extracted the phonemes for the audios in the 'sample/audio' directory. For other audios, you can extract the phonemes by other ASR tools and then map them to the CMU phoneset. Or email to [email protected] for help.

Generate Demo Results

python test_script.py --img_path xxx.jpg --audio_path xxx.wav --phoneme_path xxx.json --save_dir "YOUR_DIR"

Note that the input images must keep the same height and width and the face should be appropriately cropped as in samples/imgs. You can also preprocess your images with image_preprocess.py.

License and Citation

@InProceedings{wang2021one,
author = Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
title = {One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning},
booktitle = {AAAI 2022},
year = {2022},
}

Acknowledgement

This codebase is based on First Order Motion Model and imaginaire, thanks for their contributions.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Related tags

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Pretrained Checkpoint

Extract phoneme

Generate Demo Results

License and Citation

Acknowledgement

Owner

FuxiVirtualHuman

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

An LSTM for time-series classification

Full Stack Deep Learning Labs

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Federated Learning Based on Dynamic Regularization

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

AFLFast (extends AFL with Power Schedules)

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

This repository contains the source codes for the paper AtlasNet V2 - Learning Elementary Structures.

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

Toontown House CT Edition

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code accompanying our NeurIPS 2021 traffic4cast challenge

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Dahua Camera and Doorbell Home Assistant Integration

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning