ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Acoustic mosquito detection code with Bayesian Neural Networks

Simulated garment dataset for virtual try-on

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Stacked Recurrent Hourglass Network for Stereo Matching

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

Official Pytorch implementation of MixMo framework

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery

Help you understand Manual and w/ Clutch point while driving.

A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

A TensorFlow implementation of FCN-8s

Learning Energy-Based Models by Diffusion Recovery Likelihood