Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Last update: Dec 23, 2022

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.

Training and inference

Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.

Reference Materials

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

GlowTTS

Score-Based Generative Modeling through Stochastic Differential Equations

score_sde_pytorch

denoising-diffusion-pytorch

Authors

Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

Training and inference

Reference Materials

Authors

Owner

HeyangXue1997

LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Detecting drunk people through thermal images using Deep Learning (CNN)

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed

A Simplied Framework of GAN Inversion

Exploration of some patients clinical variables.

In-place Parallel Super Scalar Samplesort (IPS⁴o)

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

nanodet_plus,yolov5_v6.0

A minimalist environment for decision-making in autonomous driving

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

J.A.R.V.I.S is an AI virtual assistant made in python.

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

STEAL - Learning Semantic Boundaries from Noisy Annotations (CVPR 2019)