Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Overview

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

We provide PyTorch implementations for our arxiv paper "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"(http://arxiv.org/abs/2002.10137).

Note that this code is protected under patent. It is for research purposes only at your university (research institution) only. If you are interested in business purposes/for-profit use, please contact Prof.Liu (the corresponding author, email: [email protected]).

We provide a demo video here (please search for "Talking Face" in this page and click the "demo video" button).

Colab

Our Proposed Framework

Prerequisites

  • Linux or macOS
  • NVIDIA GPU
  • Python 3
  • MATLAB

Getting Started

Installation

  • You can create a virtual env, and install all the dependencies by
pip install -r requirements.txt

Download pre-trained models

  • Including pre-trained general models and models needed for face reconstruction, identity feature extraction etc
  • Download from BaiduYun(extract code:usdm) or GoogleDrive and copy to corresponding subfolders (Audio, Deep3DFaceReconstruction, render-to-video).

Download face model for 3d face reconstruction

Fine-tune on a target peron's short video

    1. Prepare a talking face video that satisfies: 1) contains a single person, 2) 25 fps, 3) longer than 12 seconds, 4) without large body translation (e.g. move from the left to the right of the screen). An example is here. Rename the video to [person_id].mp4 (e.g. 1.mp4) and copy to Data subfolder.

Note: You can make a video to 25 fps by

ffmpeg -i xxx.mp4 -r 25 xxx1.mp4
    1. Extract frames and lanmarks by
cd Data/
python extract_frame1.py [person_id].mp4
    1. Conduct 3D face reconstruction. First should compile code in Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/kernels to .so, following its readme, and modify line 28 in rasterize_triangles.py to your directory. Then run
cd Deep3DFaceReconstruction/
CUDA_VISIBLE_DEVICES=0 python demo_19news.py ../Data/[person_id]

This process takes about 2 minutes on a Titan Xp.

cd Audio/code/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in Audio/model/atcnet_pose0_con3/[person_id]. This process takes about 5 minutes on a Titan Xp.

    1. Fine-tune the gan network. Run
cd render-to-video/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in render-to-video/checkpoints/memory_seq_p2p/[person_id]. This process takes about 40 minutes on a Titan Xp.

Test on a target peron

Place the audio file (.wav or .mp3) for test under Audio/audio/. Run [with generated poses]

cd Audio/code/
python test_personalized.py [audio] [person_id] [gpu_id]

or [with poses from short video]

cd Audio/code/
python test_personalized2.py [audio] [person_id] [gpu_id]

This program will print 'saved to xxx.mov' if the videos are successfully generated. It will output 2 movs, one is a video with face only (_full9.mov), the other is a video with background (_transbigbg.mov).

Colab

A colab demo is here.

Acknowledgments

The face reconstruction code is from Deep3DFaceReconstruction, the arcface code is from insightface, the gan code is developed based on pytorch-CycleGAN-and-pix2pix.

Owner
Ran Yi
Assistant Professor at CSE Dept, SJTU
Ran Yi
MUSIC-AVQA, CVPR2022 (ORAL)

Audio-Visual Question Answering (AVQA) PyTorch code accompanies our CVPR 2022 paper: Learning to Answer Questions in Dynamic Audio-Visual Scenarios (O

44 Dec 23, 2022
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

Olivier Bélanger 122 Dec 10, 2022
Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Tradutor_MIDI-RISC-V Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS) *O resultado sai com essa formatação: nota,duração,nota,d

Gabriel B. G. 4 Sep 02, 2022
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
Minimal command-line music player written in Python

pyms Minimal command-line music player written in Python. Designed with elegance and minimalism. Resizes dynamically with your terminal. Dependencies

12 Sep 23, 2022
A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

braden 0 Nov 12, 2021
An audio-solving python funcaptcha solving module

funcapsolver funcapsolver is a funcaptcha audio-solving module, which allows captchas to be interacted with and solved with the use of google's speech

Acier 8 Nov 21, 2022
A rofi-blocks script that searches youtube and plays the selected audio on mpv.

rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block

Cliford 26 Dec 21, 2022
A fast MDCT implementation using SciPy and FFTs

MDCT A fast MDCT implementation using SciPy and FFTs Installation As usual pip install mdct Dependencies NumPy SciPy STFT Usage import mdct spectrum

Nils Werner 43 Sep 02, 2022
Terminal-based music player written in Python for the best music in the world 🎵 🎧 💻

audius-terminal-player Terminal-based music player written in Python for the best music in the world 🎵 🎧 💻 Browse and listen to Audius from the com

Audius 21 Jul 23, 2022
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
Reading list for research topics in sound event detection

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene.

Soham 64 Jan 05, 2023
𝙰 𝙼𝚞𝚜𝚒𝚌 𝙱𝚘𝚝 𝙲𝚛𝚎𝚊𝚝𝚎𝚍 𝙱𝚢 𝚃𝚎𝚊𝚖𝙳𝚕𝚝 💖

TeamDltmusic 𝙰 𝙼𝚞𝚜𝚒𝚌 𝙱𝚘𝚝 𝙲𝚛𝚎𝚊𝚝𝚎𝚍 𝙱𝚢 𝚃𝚎𝚊𝚖𝙳𝚕𝚝 💖 Deploy String Session String Click hear you can find string session OR join He

TeamDlt 5 Jan 18, 2022
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021
C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

Music Technology Group - Universitat Pompeu Fabra 2.3k Jan 03, 2023
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
Voice package for Pycord adding extra features.

VoiceIO Voice package for Pycord adding extra features. Example Down bellow is an example of what you can currently do. import voiceio process = voic

pycord 1 Dec 24, 2021
Use python MIDI to write some simple music

Use Python MIDI to write songs

小宝 1 Nov 19, 2021
SomaFM Plugin for Kodi

SomaFM XBMC Plugin This description is a bit outdated. You can simply install this addon by browsing the official repositories from within Kodi. Insta

7 Jan 21, 2022