Pytorch implementation of Tacotron

Last update: Dec 02, 2022

Overview

Tacotron-pytorch

A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Requirements

Install python 3
Install pytorch == 0.2.0
Install requirements:
```
pip install -r requirements.txt
```

Data

I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.

Training the network

STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.

Generate TTS wav file

STEP 1. Run synthesis.py. Make sure the restore step.

Samples

You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.

Reference

Keith ito: https://github.com/keithito/tacotron

Comments

Any comments for the codes are always welcome.

Pytorch implementation of Tacotron

Related tags

Overview

Tacotron-pytorch

Requirements

Data

File description

Training the network

Generate TTS wav file

Samples

Reference

Comments

Owner

soobin seo

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Transformers implementation for Fall 2021 Clinic

Tracking Progress in Natural Language Processing

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

lightweight, fast and robust columnar dataframe for data analytics with online update

A natural language modeling framework based on PyTorch

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

Partially offline multi-language translator built upon Huggingface transformers.

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Code voor mijn Master project omtrent VideoBERT

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

Random Directed Acyclic Graph Generator

Implementation of "Adversarial purification with Score-based generative models", ICML 2021