Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Overview

zhrtvc

zhrtvc

Chinese Real Time Voice Cloning

tips: 中文或汉语的语言缩写简称是zh

关注【啊啦嘻哈】微信公众号,回复一个字【】,小萝莉有话对你说哦^v^

啊啦嘻哈

语音合成工具箱

pip install -U ttskit
  • 快速使用:
import ttskit

# 合成语音
ttskit.tts('这是个样例', audio='24')
  • 网页界面:

index

# 执行python函数部署
from ttskit import http_server

http_server.start_sever()
# 打开网页:http://localhost:9000/ttskit
# 安装ttskit后,命令行部署网页界面

tkhttp

usage: tkhttp [-h] [--device DEVICE] [--host HOST] [--port PORT]

optional arguments:
  -h, --help       show this help message and exit
  --device DEVICE  设置预测时使用的显卡,使用CPU设置成-1即可
  --host HOST      IP地址
  --port PORT      端口号
  • 命令行:
tkcli

usage: tkcli [-h] [-i INTERACTION] [-t TEXT] [-s SPEAKER] [-a AUDIO]
             [-o OUTPUT] [-m MELLOTRON_PATH] [-w WAVEGLOW_PATH] [-g GE2E_PATH]
             [--mellotron_hparams_path MELLOTRON_HPARAMS_PATH]
             [--waveglow_kwargs_json WAVEGLOW_KWARGS_JSON]

版本

v1.5.6

使用说明和注意事项详见README

  • 注意事项:

    • 这个说明是新版GMW版本的语音克隆框架的说明,使用ge2e(encoder)-mellotron-waveglow的模块(简称GMW),运行更简单,效果更稳定和合成语音更加优质。

    • 基于项目Real-Time-Voice-Cloning改造为中文支持的版本ESV版本的说明见README-ESV,该版本用encoder-synthesizer-vocoder的模块(简称ESV),运行比较复杂。

    • 需要进入zhrtvc项目的代码子目录【zhrtvc】运行代码。

    • zhrtvc项目默认参数设置是适用于data目录中的样本数据,仅用于跑通整个流程。

    • 推荐使用mellotron的语音合成器和waveglow的声码器,mellotron设置多种模式适应多种任务使用。

  • 中文语料

中文语音语料zhvoice,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。

  • 中文模型

扫描上面的二维码,关注**【啊啦嘻哈】微信公众号**,回复:中文语音克隆模型走起,获取百度网盘的模型文件。

  • 合成样例

合成语音样例的目录

目录介绍

zhrtvc

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型在百度网盘下载,下载后解压,替换models文件夹即可。

data

语料样例,包括语音和文本对齐语料。

注意:

该语料样例用于测试跑通模型,数据量太少,不可能使得模型收敛,即不会训练出可用模型。

在测试跑通模型情况下,处理自己的数据为语料样例的格式,用自己的数据训练模型即可。

学习交流

【AI解决方案交流群】QQ群:925294583

点击链接加入群聊:https://jq.qq.com/?_wv=1027&k=wlQzvT0N

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Owner
Kuang Dada
让世界多一点AI。 Move on, work hard, do what you do.
Kuang Dada
Need: Image Search With Python

Need: Image Search The problem is that a user needs to search for a specific ima

Surya Komandooru 1 Dec 30, 2021
Maha is a text processing library specially developed to deal with Arabic text.

An Arabic text processing library intended for use in NLP applications Maha is a text processing library specially developed to deal with Arabic text.

Mohammad Al-Fetyani 184 Nov 27, 2022
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Lau 1 Dec 17, 2021
I can help you convert your images to pdf file.

IMAGE TO PDF CONVERTER BOT Configs TOKEN - Get bot token from @BotFather API_ID - From my.telegram.org API_HASH - From my.telegram.org Deploy to Herok

MADUSHANKA 10 Dec 14, 2022
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

Explosion 65 Dec 30, 2022
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
English loanwords in the world's languages

Wiktionary as CLDF Content cldf1 and cldf2 contain cldf-conform data sets with a total of 2 377 756 entries about the vocabulary of all 1403 languages

Viktor Martinović 3 Jan 14, 2022
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

Hassan Shahzad 3 Oct 15, 2021
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The train

Brando Koch 11 Mar 28, 2022
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 04, 2023
A list of NLP(Natural Language Processing) tutorials

NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and

Allen Lee 1.3k Dec 25, 2022
Mycroft Core, the Mycroft Artificial Intelligence platform.

Mycroft Mycroft is a hackable open source voice assistant. Table of Contents Getting Started Running Mycroft Using Mycroft Home Device and Account Man

Mycroft 6.1k Jan 09, 2023
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

Galois Autocompleter 91 Sep 23, 2022
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022
基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

SentencesJudger SentencesJudger 是一个基于GRU神经网络的句子判断程序,基本的功能是判断文章中的某一句话是否为一个优美的句子。 English 如何使用SentencesJudger 确认Python运行环境 安装pyTorch与LTP python3 -m pip

8 Mar 24, 2022