German Text-To-Speech Engine using Tacotron and Griffin-Lim

Last update: Aug 28, 2022

Related tags

Overview

jotts

JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due to real time usage I decided not to include a vocoder and use griffin-lim instead which results in a more robotic voice but is much faster.

API

First create an instance of JoTTS. The initializer takes force_model_download as an optional parameter in case that the last download of the synthesizer failed and the model cannot be applied.
Call speak with a text parameter that contains the text to speak out loud. The second parameter can be set to True, to wait until speaking is done.
Use text2wav to create a wav file instead of speaking the text.

Example usage

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

Todo

Add an option to change the default audio device to speak the text
Add a parameter to select other models but the default model
Add threading or multi processing to allow speaking without blocking
Add a vocoder instead of griffin-lim to improve audio output.

Training a model for your own voice

Training a synthesizer model is easy - if you know how to do it. I created a course on udemy to show you how it is done. Don't buy the tutorial for the full price, there is a discout every month :-)

https://www.udemy.com/course/voice-cloning/

If you neither have the backgroud or the resources or if you are just lazy or too rich, contact me for contract work. Cloning a voice normally needs ~15 Minutes of clean audio from the voice you want to clone.

Disclaimer

I hope that my (and any other person's) voice will be used only for legal and ethical purposes. Please do not get into mischief with it.

Comments

SSL: CERTIFICATE_VERIFY_FAILED

my code is

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.textToWav("Es war aber auch schon mal besser!")

and I receive this :

2022-11-01 09:39:57.536 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:50 - There is no tts model yet, downloading...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:60 - Download file: https://github.com/padmalcom/jotts/releases/download/v0.1/v0.1.pt
v0.1.pt: 0.00B [00:00, ?B/s]

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    jotts = JoTTS()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 68, in __init__
    MODEL_FILE = self.__prepare_model__(force_model_download);
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 62, in __prepare_model__
    urllib.request.urlretrieve(DOWNLOAD_URL, filename=MODEL_FILE, reporthook=t.update_to)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

what am I doing wrong. ? Thanks !

opened by deladriere 3

Samples of jotts in combination with a modern vocoder like (MB)Melgan, HifiGAN

I tried to drop a spectrogram sanmple as npy and feed HifiGAN but it gave me a lot of noise. I am wondering how good your results are, do you have samples with vocoders like above?

opened by eqikkwkp25-cyber 2

jotts.text2wav not existing / needs jotts.textToWav

running this example on MacOS 11.6

from jotts import JoTTS

jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.speak("Wir sind Die Roboter.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

give an error trying to generate the wav file (The speak function works really well !)

2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.

| Generating 1/1


Done.

Traceback (most recent call last):
  File "test_jotts.py", line 6, in <module>
    jotts.text2wav("Es war aber auch schon mal besser!")
AttributeError: 'JoTTS' object has no attribute 'text2wav'

using jotts.textToWav works well but there is still this [W NNPACK.cpp:79] message here is the output

2021-12-14 17:45:31.699 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.


| Generating 1/1


Done.


| Generating 1/1


Done.

opened by deladriere 2

can this run on a Rapsberry Pi Zero ?

Sorry not an issue but I would like to have a Raspberry Pi Zero speak German without the need for an Internet connection (Amazon Polly and IBM Watson have great German voices but are paid service quite complex to install - not to mention the need for a connect and its delays) I just subscribed to your course (I understand only a bit of German) ;-) Maybe some of the heavy work can be done on a fast computer but I need the text to speech to be done on the Raspberry Pi ?

opened by deladriere 2
Missing additional information in README

Typo somewhere: The readme says "The synthesizer model has been trained on my voice using Tacotron1." while the releases say "v0.1 Latest Pre-trained German synthesizer model based on tacotron2."

Can you add more hints how you trained your model(s), i.e. which base repository, data structure and how many hours of your voice you need for the current results?

opened by eqikkwkp25-cyber 1

Releases(generic_v0.4)

generic_v0.4(Dec 30, 2022)

Trained for 98k steps on german common voice dataset.
Source code(tar.gz)
Source code(zip)
generic_v0.4.pt(353.51 MB)
vocoder_v0.1(Nov 8, 2022)

WaveRNN vocoder trained for 142.000 steps. Can be used instead of griffin-lim algorithm, might deliver better results but requires more ressources to apply.
Source code(tar.gz)
Source code(zip)
vocoder_v0.1.pt(51.40 MB)
jonas_v0.1(Nov 22, 2021)

Pre-trained German synthesizer model based on tacotron.
Source code(tar.gz)
Source code(zip)
jonas_v0.1.pt(353.49 MB)
generic_v0.3(Oct 27, 2022)

Trained for 75k steps on high quality voice.
Source code(tar.gz)
Source code(zip)
generic_v0.3.pt(353.49 MB)

Owner

padmalcom

PhD in Computer Science, interested in machine learning, game programming and robotics. Hope my projects help somewhere.

GitHub Repository

Converts python code into c++ by using OpenAI CODEX.

🦾 codex_py2cpp 🤖 OpenAI Codex Python to C++ Code Generator Your Python Code is too slow? 🐌 You want to speed it up but forgot how to code in C++? ⌨

423 Jan 01, 2023

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Training-code-of-STM This repository fully reproduces Space-Time Memory Networks Performance on Davis17 val set&Weights backbone training stage traini

128 Dec 11, 2022

A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

23 Dec 25, 2022

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Bunkai Bunkai is a sentence boundary (SB) disambiguation tool for Japanese texts. Quick Start $ pip install bunkai $ echo -e '宿を予約しました♪!まだ2ヶ月も先だけど。早すぎ

160 Dec 23, 2022

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

42 Dec 13, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

829 Jan 07, 2023

KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

58 Dec 07, 2022

Generate text line images for training deep learning OCR model (e.g. CRNN)

532 Jan 06, 2023

Share constant definitions between programming languages and make your constants constant again

Introduction Reconstant lets you share constant and enum definitions between programming languages. Constants are defined in a yaml file and converted

47 Sep 10, 2022

A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

1.2k Dec 18, 2022

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

18 Nov 25, 2022

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

8.8k Jan 01, 2023

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

1.4k Jan 04, 2023

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

21 Dec 14, 2022

A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

2 Jan 20, 2022

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

110 Jan 07, 2023

基于Transformer的单模型、多尺度的VAE模型

UniVAE 基于Transformer的单模型、多尺度的VAE模型介绍 https://kexue.fm/archives/8475 依赖需要大于0.10.6版本的bert4keras（当前还没有推到pypi上，可以直接从GitHub上clone最新版）。引用 @misc{univae,

49 Aug 24, 2022

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

11 Nov 16, 2022