Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Last update: Sep 28, 2022

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes.

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

python 3.6.10
torch 1.3.1
chainer 6.0.0
espnet 8.0.0
apex 0.1
numpy 1.19.1
kaldiio 2.15.1
librosa 0.8.0

Training and inference:

Step1. Data preparation & preprocessing

Download LJSpeech
Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1
Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack it to /xx/TextGrid
Preprocess the dataset to extract mel-spectrograms, phoneme duration, pitch, energy and phoneme sequence by:
```
 python preprocessing.py --data-root /xx/LJSpeech-1.1 --textgrid-root /xx/TextGrid
```

Step2. Model training

Training teacher model FCL-taco2-T:
```
 ./teacher_model_training.sh
```
Training student model FCL-taco2-S:
```
 ./student_model_training.sh
```
Parallel-WaveGAN vocoder training: follow instructions at here. You can also download the pre-trained PWG vocoder, and put the PWG model under the directory "vocoder".

Step3. Model evaluation

FCL-taco2-T evaluation:
```
 ./inference_teacher.sh
```
FCL-taco2-S evaluation:
```
 ./inference_student.sh
```

Citation

If the code is used in your research, please star our repo and cite our paper:

@inproceedings{wang2021fcl,
  title={Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis},
  author={Wang, Disong and Deng, Liqun and Zhang, Yang and Zheng, Nianzu and Yeung, Yu Ting and Chen, Xiao and Liu, Xunying and Meng, Helen},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5714--5718},
  year={2021},
  organization={IEEE}
}

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Related tags

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

Training and inference:

Citation

Owner

Disong Wang

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Differentiable scientific computing library

A simple approach to emable dense segmentation with ViT.

Hands-On Machine Learning for Algorithmic Trading, published by Packt

Dynamica causal Bayesian optimisation

Per-Pixel Classification is Not All You Need for Semantic Segmentation

A unified framework to jointly model images, text, and human attention traces.

A Python wrapper for Google Tesseract

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

This repo implements a 3D segmentation task for an airport baggage dataset.

Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

PyTorch implementations of neural network models for keyword spotting

Warning: This project does not have any current developer. See bellow.

Multi-Joint dynamics with Contact. A general purpose physics simulator.

Point Cloud Registration using Representative Overlapping Points.

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization