Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS πŸ“’ πŸ€–

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°
     > Text splitted to sentences.
    ['ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Π‘ΠΎΠ±Π΅Ρ€ Π½Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· бобрСнятами Π±ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½Π° ΠΏΠΎ Π±ΠΎΡ€ΠΎΠ½ΠΎΠ²Π°Π½ΠΎΠΌΡƒ полю.
    
    Π†ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€Ρ–ΠΏ, як ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ–, Ρ‚Π°ΠΊ Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ– Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. Π―ΠΊ ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’Π°ΠΊ Ρ– Π±Π΅Π· ΠŸΡ€ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.
    

    Result:

    
    Π‘ΠΎΠ±+Π΅Ρ€ Π½+Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· Π±ΠΎΠ±Ρ€Π΅Π½+ятами Π±+ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½+ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½+Π° ΠΏ+ΠΎ Π±ΠΎΡ€ΠΎΠ½+ΠΎΠ²Π°Π½ΠΎΠΌΡƒ ΠΏ+олю.
    
    Π†Ρˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€+Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€+Ρ–ΠΏ, +як ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ–, Ρ‚+Π°ΠΊ +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ– +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄+ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–Ρˆ+ΠΎΠ² ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. +Π―ΠΊ ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’+Π°ΠΊ +Ρ– Π±+Π΅Π· ΠŸΡ€+ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

UniversitΓ€tsbibliothek Mannheim 80 Jan 03, 2023
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

Ifechukwudeni Oweh 10 Jul 17, 2022
πŸ›Έ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 08, 2023
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
This is the offline-training-pipeline for our project.

offline-training-pipeline This is the offline-training-pipeline for our project. We adopt the offline training and online prediction Machine Learning

0 Apr 22, 2022
Repository for the paper: VoiceMe: Personalized voice generation in TTS

πŸ—£ VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai

Pol van Rijn 80 Dec 29, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
This is Assignment1 code for the Web Data Processing System.

This is a Python program to Entity Linking by processing WARC files. We recognize entities from web pages and link them to a Knowledge Base(Wikidata).

3 Dec 04, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022
Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting Official PyTorch Implementation of paper "NeLF: Neural Light-tran

Ken Lin 38 Dec 26, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural languag

Benjamin Heinzerling 1.1k Jan 03, 2023
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

Tencent YouTu Research 9 Oct 11, 2022
MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

2 Mar 09, 2022
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
πŸ—£οΈ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

Gustavo Rosa 21 Aug 12, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021