Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Last update: Dec 24, 2022

Overview

CycleGAN-VC3-PyTorch

This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion, a nice work on Voice-Conversion/Voice Cloning.

CycleGAN-VC3

Project Page

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, CycleGAN-VC [3] and CycleGAN-VC2 [2] have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram.

Figure 1. We developed time-frequency adaptive normalization (TFAN), which extends instance normalization [5] so that the affine parameters become element-dependent and are determined according to an entire input mel-spectrogram.

This repository contains:

TFAN module code which implemented the TFAN module
model code which implemented the model network.
audio preprocessing script you can use to create cache for training data.
training scripts to train the model.

CycleGAN-VC3-PyTorch

Requirement

pip install -r requirements.txt

Usage

Reference

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion. Paper, Project
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

WechatPay(微信)

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Related tags

Overview

CycleGAN-VC3-PyTorch

CycleGAN-VC3

Project Page

Table of Contents

Requirement

Usage

Reference

Donation

License

Owner

Kun Ma

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

For IBM Quantum Challenge 2021 (May 20 - 26)

Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

OSLO: Open Source framework for Large-scale transformer Optimization

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

CMSC320 - Introduction to Data Science - Fall 2021

A tool to estimate time varying instantaneous reproduction number during epidemics

Cookiecutter PyTorch Lightning

The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 2021)

CTF Challenge for CSAW Finals 2021

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Pytorch implementation of AREL

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

Computations and statistics on manifolds with geometric structures.

Process JSON files for neural recording sessions using Medtronic's BrainSense Percept PC neurostimulator