Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Last update: Sep 07, 2022

Related tags

Overview

Multi-speaker DGP

This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Our paper: Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation

Test environment

This repository is tested in the following environment.

Ubuntu 18.04
NVIDIA GeForce RTX 2080 Ti
Python 3.7.3
CUDA 11.1
cuDNN 8.1.1

Setup

You can complete setup by simply executing setup.sh.

$ . ./setup.sh

*Please make sure that installed PyTorch is compatible with CUDA (see https://pytorch.org/ for more info). Otherwise, CUDA error will occur during training.

How to use

This repository is designed according to Kaldi-style recipe. To run the scripts, please follow the below instruction. JVS corpus [Takamichi et al., 2020] can be downloaded from here.

# Move to the recipe directory
$ cd egs/jvs

# Download the corpus to be used. The directory structure will be as follows:

├── conf/     # directory containing YAML format configuration files
├── jvs_ver1/ # downloaded data
├── local/    # directory containing corpus-dependent scripts
└── run.sh    # main scripts

# Run the recipe from scratch
$ ./run.sh

# Or you can run the recipe step by step
$ ./run.sh --stage 0 --stop-stage 0  # train/dev/eval split
$ ./run.sh --stage 1 --stop-stage 1  # preprocessing
$ ./run.sh --stage 2 --stop-stage 2  # train phoneme duration model
$ ./run.sh --stage 3 --stop-stage 3  # train acoustic model
$ ./run.sh --stage 4 --stop-stage 4  # decoding

# During stage 2 & 3, you can monitor logs using Tensorboard
# for example:
$ tensorboard --logdir exp/dgp

How to customize

conf/*.yaml include all settings for data preparation, preprocessing, training, and decoding. We have prepared two configuration files, dgp.yaml and dgplvm.yaml. You can change experimental conditions by editing these files.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

Histology images query (unsupervised)

Just-Now - This Is Just Now Login Friendlist Cloner Tools

Official code for "Decoupling Zero-Shot Semantic Segmentation"

LIVECell - A large-scale dataset for label-free live cell segmentation

A simple AI that will give you si ple task and this is made with python

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

Open-source implementation of Google Vizier for hyper parameters tuning

Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Development Kit for the SoccerNet Challenge

Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).