PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Code examples for my Write Better Python Code series on YouTube.

A curated list of efficient attention modules

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

A demo for end-to-end English and Chinese text spotting using ABCNet.

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Dust model dichotomous performance analysis

Fine-tune GPT-3 with a Google Chat conversation history

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

Rootski - Full codebase for rootski.io (without the data)

Textlesslib - Library for Textless Spoken Language Processing

Implementation of Fast Transformer in Pytorch

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

FastFormers - highly efficient transformer models for NLU