Refactored version of FastSpeech2

Overview

FastSpeech2

This repository is a refactored version from ming024's own. I focused on refactoring structure for fitting my cases and making parallel pre-processing codes. And I wrote installation guide with the latest version of MFA(Montreal Force Aligner).

Installation

  • Tested on python 3.8, Ubuntu 20.04

    • Notice ! For installing MFA, you should install the miniconda.
    • If you run MFA under 16.04 or ealier version of Ubuntu, you will face a compile error.
  • In your system

    • To install pyworld, run "sudo apt-get install python3.x-dev". (x is your python version).
    • To install sndfile, run "sudo apt-get install libsndfile-dev"
    • To use MFA, run "sudo apt-get install libopenblas-base"
  • Install requirements

# install pytorch_sound
pip install git+https://github.com/appleholic/pytorch_sound
pip install -e .
  • Download datasets
  1. VCTK
  2. LibriTTS
    • To be updated
  • Install MFA

    • Visit and follow a guide that described in MFA installation website.
    • Additional installation
      • mfa thirdparty download
      • mfa download acoustic english
  • Pre-trained checkpoint

Preprocess (VCTK case)

  1. Prepare MFA
python fastspeech2/scripts/prepare_align.py configs/vctk_prepare_align.json
  1. Run MFA for making alignments
# Define your the number of threads to run MFA at the last of a command. "-j [The number of threads]"
mfa align data/fastspeech2/vctk lexicons/librispeech-lexicon.txt english data/fastspeech2/vctk-pre -j 24
  1. Feature preprocessing
python fastspeech2/scripts/preprocess.py configs/vctk_preprocess.json

Train

  1. Multi-speaker fastspeech2
python fastspeech2/scripts/train.py configs/fastspeech2_vctk_tts.json
  • If you want to change the parameters of training FastSpeech2, check out the code and put the option to configuration file.
    • train code : fastspeech2/scripts/train.py
    • config : configs/fastspeech2_vctk_tts.json
  1. Fastspeech2 with reference encoder (To be updated)

Synthesize

Multi-spaker model

  • In a code
from fastspeech2.inference import Inferencer
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# arguments
# chk_path: str, lexicon_path: str, device: str = 'cuda'
inferencer = Inferencer(chk_path=chk_path, lexicon_path=lexicon_path, device=device)

# initialize hifigan
interface = InterfaceHifiGAN(model_name='hifi_gan_v1_universal', device='cuda')

# arguments
# text: str, speaker: int = 0, pitch_control: float = 1., energy_control: float = 1., duration_control: float = 1.
txt = 'Hello, I am a programmer.'
mel_spectrogram = inferencer.tts(txt, speaker=0)

# Reconstructs speech by using Hifi-GAN
pred_wav = interface.decode(mel_spectrogram.transpose(1, 2)).squeeze()

# If you test on a jupyter notebook
from IPython.display import Audio
Audio(pred_wav.cpu().numpy(), rate=22050)
  • In command line
python fastspeech2/scripts/synthesize.py [TEXT] [OUTPUT PATH] [CHECKPOINT PATH] [LEXICON PATH] [[DEVICE]] [[SPEAKER]]

Reference encoder (not updated)

Reference

Owner
ILJI CHOI
AI Research Engineer
ILJI CHOI
Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

Nate Baer 7 Dec 10, 2022
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

Alibaba 1.4k Jan 04, 2023
Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

TextCortex AI 27 Nov 28, 2022
Задания КЕГЭ по информатике 2021 на Python

КЕГЭ 2021 на Python В этом репозитории мои решения типовых заданий КЕГЭ по информатике в 2021 году, БЕСПЛАТНО! Задания Взяты с https://inf-ege.sdamgia

8 Oct 13, 2022
Trained T5 and T5-large model for creating keywords from text

text to keywords Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru Pretraining Large version | Pretraining B

Danil 61 Nov 24, 2022
Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

Hamza Rashid 2 Sep 27, 2022
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Description xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building bl

Facebook Research 2.3k Jan 08, 2023
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

Sipeed 267 Dec 25, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

"# bpe_algorithm_can_finetune_tokenizer" this is an implyment for https://github

张博 1 Feb 02, 2022
Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Rank-One Model Editing (ROME) This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).

Kevin Meng 130 Dec 21, 2022
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022