Refactored version of FastSpeech2

Last update: May 26, 2022

Overview

FastSpeech2

This repository is a refactored version from ming024's own. I focused on refactoring structure for fitting my cases and making parallel pre-processing codes. And I wrote installation guide with the latest version of MFA(Montreal Force Aligner).

Installation

Tested on python 3.8, Ubuntu 20.04
- Notice ! For installing MFA, you should install the miniconda.
- If you run MFA under 16.04 or ealier version of Ubuntu, you will face a compile error.
In your system
- To install pyworld, run "sudo apt-get install python3.x-dev". (x is your python version).
- To install sndfile, run "sudo apt-get install libsndfile-dev"
- To use MFA, run "sudo apt-get install libopenblas-base"
Install requirements

# install pytorch_sound
pip install git+https://github.com/appleholic/pytorch_sound
pip install -e .

Download datasets

VCTK
- Visit and download dataset from https://datashare.is.ed.ac.uk/handle/10283/2651
- Move to "./data" and extract compressed file.
  - If you wanna save dataset to another directory, you must change the path of configuration files.
LibriTTS
- To be updated

Install MFA
- Visit and follow a guide that described in MFA installation website.
- Additional installation
  - mfa thirdparty download
  - mfa download acoustic english
Pre-trained checkpoint
- VCTK, 400k steps : Google Drive Link

Preprocess (VCTK case)

Prepare MFA

python fastspeech2/scripts/prepare_align.py configs/vctk_prepare_align.json

Run MFA for making alignments

# Define your the number of threads to run MFA at the last of a command. "-j [The number of threads]"
mfa align data/fastspeech2/vctk lexicons/librispeech-lexicon.txt english data/fastspeech2/vctk-pre -j 24

Feature preprocessing

python fastspeech2/scripts/preprocess.py configs/vctk_preprocess.json

Train

Multi-speaker fastspeech2

python fastspeech2/scripts/train.py configs/fastspeech2_vctk_tts.json

If you want to change the parameters of training FastSpeech2, check out the code and put the option to configuration file.
- train code : fastspeech2/scripts/train.py
- config : configs/fastspeech2_vctk_tts.json

Fastspeech2 with reference encoder (To be updated)

Synthesize

Multi-spaker model

In a code

from fastspeech2.inference import Inferencer
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# arguments
# chk_path: str, lexicon_path: str, device: str = 'cuda'
inferencer = Inferencer(chk_path=chk_path, lexicon_path=lexicon_path, device=device)

# initialize hifigan
interface = InterfaceHifiGAN(model_name='hifi_gan_v1_universal', device='cuda')

# arguments
# text: str, speaker: int = 0, pitch_control: float = 1., energy_control: float = 1., duration_control: float = 1.
txt = 'Hello, I am a programmer.'
mel_spectrogram = inferencer.tts(txt, speaker=0)

# Reconstructs speech by using Hifi-GAN
pred_wav = interface.decode(mel_spectrogram.transpose(1, 2)).squeeze()

# If you test on a jupyter notebook
from IPython.display import Audio
Audio(pred_wav.cpu().numpy(), rate=22050)

In command line

python fastspeech2/scripts/synthesize.py [TEXT] [OUTPUT PATH] [CHECKPOINT PATH] [LEXICON PATH] [[DEVICE]] [[SPEAKER]]

Reference encoder (not updated)

Reference

ming024/FastSpeech2

Refactored version of FastSpeech2

Related tags

Overview

FastSpeech2

Installation

Preprocess (VCTK case)

Train

Synthesize

Multi-spaker model

Reference encoder (not updated)

Reference

Owner

ILJI CHOI

基于“Seq2Seq+前缀树”的知识图谱问答

Pytorch NLP library based on FastAI

TPlinker for NER 中文/英文命名实体识别

Partially offline multi-language translator built upon Huggingface transformers.

My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

Higher quality textures for the Metal Gear Solid series.

Augmenty is an augmentation library based on spaCy for augmenting texts.

Creating an LSTM model to generate music

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Entity Disambiguation as text extraction (ACL 2022)

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Nmt - TensorFlow Neural Machine Translation Tutorial

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

CorNet Correlation Networks for Extreme Multi-label Text Classification

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

Library for fast text representation and classification.