FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

Overview

LST-TTS

Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audio samples/demo for our system can be accessed here

Setting up submodules

git submodule update --init --recursive

Get the waveglow vocoder checkpoint from here (This is from the NVIDIA official WaveGlow repo).

Setup environment

See docker/Dockerfile for the packages need to be installed.

Dataset preprocessing

LJSpeech

python preprocess_LJSpeech.py --datadir LJSpeechDir --outputdir OutputDir

VCTK

Get the leading and trailing scilence marks from this repo, and put vctk-silences.0.92.txt in your VCTK dataset directory.

python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Train_Dir
python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Test_Dir --make_test_set
  • --make_test_set: specify this flag to process the speakers in the test set, otherwise only process training speakers.

Training

LJSpeech

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 128 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 250000 \
                    --n_guided_steps 250000 \
                    --saving_path Output_CKPT_DIR \
                    --datatype LJSpeech \
                    [--distributed]
  • --distributed: enable DDP multi-GPU training
  • --batch_size: batch size per GPU, scale down if you train with multi-GPU and want to keep the same batch size
  • --check_val_every_n_epoch: sample and validate every n epoch
  • --datadir: output directory of the preprocess scripts

VCTK

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 64 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 150000 \
                    --n_guided_steps 150000 \
                    --etts_checkpoint LJSpeech_Model_CKPT \
                    --saving_path Output_CKPT_DIR \
                    --datatype VCTK \
                    [--distributed]
  • --etts_checkpoint: the checkpoint path of pretrained model (on LJ Speech)

Synthesis

We provide examples for synthesis of the system in synthesis.py, you can adjust this script to your own usage. Example to run synthesis.py:

python synthesis.py --etts_checkpoint VCTK_Model_CKPT \
                    --sampledir SampleDir \
                    --datatype VCTK \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH
Owner
Li-Wei Chen
Li-Wei Chen
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
DeRF: Decomposed Radiance Fields

DeRF: Decomposed Radiance Fields Daniel Rebain, Wei Jiang, Soroosh Yazdani, Ke Li, Kwang Moo Yi, Andrea Tagliasacchi Links Paper Project Page Abstract

UBC Computer Vision Group 24 Dec 02, 2022
TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

Donny You 2.2k Jan 06, 2023
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Target Adaptive Context Aggregation for Video Scene Graph Generation This is a PyTorch implementation for Target Adaptive Context Aggregation for Vide

Multimedia Computing Group, Nanjing University 44 Dec 14, 2022
The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

This is the project page for the paper: Architecture Disentanglement for Deep Neural Networks, Jie Hu, Liujuan Cao, Tong Tong, Ye Qixiang, ShengChuan

Jie Hu 15 Aug 30, 2022
Python3 Implementation of (Subspace Constrained) Mean Shift Algorithm in Euclidean and Directional Product Spaces

(Subspace Constrained) Mean Shift Algorithms in Euclidean and/or Directional Product Spaces This repository contains Python3 code for the mean shift a

Yikun Zhang 0 Oct 19, 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv Abstract Convolutional Neural Networks (CNNs) have become the de f

Philipp Benz 12 Oct 24, 2022
[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

SSH: A Self-Supervised Framework for Image Harmonization (ICCV 2021) code for SSH Representative Examples Main Pipeline RealHM DataSet Google Drive Pr

VITA 86 Dec 02, 2022
Seg-Torch for Image Segmentation with Torch

Seg-Torch for Image Segmentation with Torch This work was sparked by my personal research on simple segmentation methods based on deep learning. It is

Eren Gölge 37 Dec 12, 2022
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 03, 2022
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022
Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks

This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset and my original methods that are publi

Shunta Saito 255 Sep 07, 2022
COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

COVID-ViT COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models This code is to response to te MIA-COV19 compe

17 Dec 30, 2022
NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NeuroFind A solution to the to the Task given by the Oberseminar of Messtechnik

1 Jan 20, 2022
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
SoK: Vehicle Orientation Representations for Deep Rotation Estimation

SoK: Vehicle Orientation Representations for Deep Rotation Estimation Raymond H. Tu, Siyuan Peng, Valdimir Leung, Richard Gao, Jerry Lan This is the o

FIRE Capital One Machine Learning of the University of Maryland 12 Oct 07, 2022
PiRapGenerator - Make anyone rap the digits of pi

PiRapGenerator Make anyone rap the digits of pi (sample files are of Ted Nivison

7 Oct 02, 2022
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 03, 2022
AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning AutoPentest-DRL is an automated penetration testing framework based o

Cyber Range Organization and Design Chair 217 Jan 01, 2023