Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Unofficial implementation of PatchCore anomaly detection

TRACER: Extreme Attention Guided Salient Object Tracing Network implementation in PyTorch

Deep learning library featuring a higher-level API for TensorFlow.

Efficient and intelligent interactive segmentation annotation software

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

custom pytorch implementation of MoCo v3

Provide partial dates and retain the date precision through processing

In generative deep geometry learning, we often get many obj files remain to be rendered

Fake News Detection Using Machine Learning Methods

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Image inpainting using Gaussian Mixture Models

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Convolutional Neural Network for Text Classification in Tensorflow

A curated list of Generative Deep Art projects, tools, artworks, and models