Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Overview

MLP Singer

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Abstract

Recent developments in deep learning have significantly improved the quality of synthesized singing voice audio. However, prominent neural singing voice synthesis systems suffer from slow inference speed due to their autoregressive design. Inspired by MLP-Mixer, a novel architecture introduced in the vision literature for attention-free image classification, we propose MLP Singer, a parallel Korean singing voice synthesis system. To the best of our knowledge, this is the first work that uses an entirely MLP-based architecture for voice synthesis. Listening tests demonstrate that MLP Singer outperforms a larger autoregressive GAN-based system, both in terms of audio quality and synthesis speed. In particular, MLP Singer achieves a real-time factor of up to 200 and 3400 on CPUs and GPUs respectively, enabling order of magnitude faster generation on both environments.

Citation

Please cite this work as follows.

@misc{tae2021mlp,
      title={MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis}, 
      author={Jaesung Tae and Hyeongju Kim and Younggun Lee},
      year={2021},
}

Quickstart

  1. Clone the repository including the git submodule.

    git clone --recurse-submodules https://github.com/neosapience/mlp-singer.git
  2. Install package requirements.

cd mlp-singer
pip install -r requirements.txt
  1. To generate audio files with the trained model checkpoint, download the HiFi-GAN checkpoint along with its configuration file and place them in hifi-gan.

  2. Run inference using the following command. Generated audio samples are saved in the samples directory by default.

    python inference.py --checkpoint_path checkpoints/default/model.pt

Dataset

We used the Children Song Dataset, an open-source singing voice dataset comprised of 100 annotated Korean and English children songs sung by a single professional singer. We used only the Korean subset of the dataset to train the model.

You can train the model on any custom dataset of your choice, as long as it includes lyrics text, midi transcriptions, and monophonic a capella audio file triplets. These files should be titled identically, and should also be placed in specific directory locations as shown below.

├── data
│   └── raw
│       ├── mid
│       ├── txt
│       └── wav

The directory names correspond to file extensions. We have included a sample as reference.

Preprocessing

Once you have prepared the dataset, run

python -m data.serialize

from the root directory. This will create data/bin that contains binary files used for training. This repository already contains example binary files created from the sample in data/raw.

Training

To train the model, run

python train.py

This will read the default configuration file located in configs/model.json to initialize the model. Alternatively, you can also create a new configuration and train the model via

python train.py --config_path PATH/TO/CONFIG.json

Running this command will create a folder under the checkpoints directory according to the name field specified in the configuration file.

You can also continue training from a checkpoint. For example, to resume training from the provided pretrained model checkpoint, run

python train.py --checkpoint_path /checkpoints/default/model.pt

Unless a --config_path flag is explicitly provided, the script will read config.json in the checkpoint directory. In both cases, model checkpoints will be saved regularly according to the interval defined in the configuration file.

Inference

MLP Singer produces mel-spectrograms, which are then fed into a neural vocoder to generate raw waveforms. This repository uses HiFi-GAN as the vocoder backend, but you can also plug other vocoders like WaveGlow. To generate samples, run

python inference.py --checkpoint_path PATH/TO/CHECKPOINT.pt --song little_star

This will create .wav samples in the samples directory, and save mel-spectrogram files as .npy files in hifi-gan/test_mel_dirs.

You can also specify any song you want to perform inference on, as long as the song is present in data/raw. The argument to the --song flag should match the title of the song as it is saved in data/raw.

Note

For demo and internal experiments, we used a variant of HiFi-GAN that used different mel-spectrogram configurations. As such, the provided checkpoint for MLP Singer is different from the one referred to in the paper. Moreover, the vocoder used in the demo was further fine-tuned on the Children's Song Dataset.

Acknowledgements

This implementation was inspired by the following repositories.

License

Released under the MIT License.

Owner
Neosapience
Neosapience, an artificial being enabled by artificial intelligence, will soon be everywhere in our daily lives.
Neosapience
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

파이썬 비트코인 투자 자동화 강의 코드 by 유튜브 조코딩 채널 pyupbit 라이브러리를 활용하여 upbit 거래소에서 비트코인 자동매매를 하는 코드입니다. 파일 구성 test.py : 잔고 조회 (1강) backtest.py : 백테스팅 코드 (2강) bestK.p

조코딩 JoCoding 186 Dec 29, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ 🌍 Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 05, 2022
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [Project Page] [Paper] [Video] Wenlong Huang1, Pieter Abbee

Wenlong Huang 114 Dec 29, 2022
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
Nested Named Entity Recognition for Chinese Biomedical Text

CBio-NAMER CBioNAMER (Nested nAMed Entity Recognition for Chinese Biomedical Text) is our method used in CBLUE (Chinese Biomedical Language Understand

8 Dec 25, 2022
Bnagla hand written document digiiztion

Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields

Mushfiqur Rahman 1 Dec 10, 2021
Persian Bert For Long-Range Sequences

ParsBigBird: Persian Bert For Long-Range Sequences The Bert and ParsBert algorithms can handle texts with token lengths of up to 512, however, many ta

Sajjad Ayoubi 63 Dec 14, 2022
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

Mark Dong 166 Dec 11, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

49 Aug 24, 2022
A Python/Pytorch app for easily synthesising human voices

Voice Cloning App A Python/Pytorch app for easily synthesising human voices Documentation Discord Server Video guide Voice Sharing Hub FAQ's System Re

Ben Andrew 840 Jan 04, 2023
A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

Ifechukwudeni Oweh 10 Jul 17, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

Shimin Zhang 87 Dec 19, 2022
Text Analysis & Topic Extraction on Android App user reviews

AndroidApp_TextAnalysis Hi, there! This is code archive for Text Analysis and Topic Extraction from user_reviews of Android App. Dataset Source : http

Fitrie Ratnasari 1 Feb 14, 2022
本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。

【关于 NLP】那些你不知道的事 作者:杨夕、芙蕖、李玲、陈海顺、twilight、LeoLRH、JimmyDU、艾春辉、张永泰、金金金 介绍 本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。 目录架构 一、【

1.4k Dec 30, 2022