Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes.

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

  • python 3.6.10
  • torch 1.3.1
  • chainer 6.0.0
  • espnet 8.0.0
  • apex 0.1
  • numpy 1.19.1
  • kaldiio 2.15.1
  • librosa 0.8.0

Training and inference:

  • Step1. Data preparation & preprocessing
  1. Download LJSpeech

  2. Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1

  3. Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack it to /xx/TextGrid

  4. Preprocess the dataset to extract mel-spectrograms, phoneme duration, pitch, energy and phoneme sequence by:

     python preprocessing.py --data-root /xx/LJSpeech-1.1 --textgrid-root /xx/TextGrid
    
  • Step2. Model training
  1. Training teacher model FCL-taco2-T:

     ./teacher_model_training.sh
    
  2. Training student model FCL-taco2-S:

     ./student_model_training.sh
    
  3. Parallel-WaveGAN vocoder training: follow instructions at here. You can also download the pre-trained PWG vocoder, and put the PWG model under the directory "vocoder".

  • Step3. Model evaluation
  1. FCL-taco2-T evaluation:

     ./inference_teacher.sh
    
  2. FCL-taco2-S evaluation:

     ./inference_student.sh
    

Citation

If the code is used in your research, please star our repo and cite our paper:

@inproceedings{wang2021fcl,
  title={Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis},
  author={Wang, Disong and Deng, Liqun and Zhang, Yang and Zheng, Nianzu and Yeung, Yu Ting and Chen, Xiao and Liu, Xunying and Meng, Helen},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5714--5718},
  year={2021},
  organization={IEEE}
}
Owner
Disong Wang
PhD student @ CUHK, focus on voice conversion, speech synthesis, speech recognition, etc.
Disong Wang
A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

Gustavo Lopes 3 Nov 16, 2022
PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

Princeton Natural Language Processing 657 Jan 09, 2023
Syntax-Aware Action Targeting for Video Captioning

Syntax-Aware Action Targeting for Video Captioning Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). Th

59 Oct 13, 2022
FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation

This repository contains the code accompanying the paper " FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation" Paper link: R

20 Jun 29, 2022
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning Reference Abeßer, J. & Müller, M. Towards Audio Domain Adapt

Jakob Abeßer 2 Jul 06, 2022
给yolov5加个gui界面,使用pyqt5,yolov5是5.0版本

博文地址 https://xugaoxiang.com/2021/06/30/yolov5-pyqt5 代码执行 项目中使用YOLOv5的v5.0版本,界面文件是project.ui pip install -r requirements.txt python main.py 图片检测 视频检测

Xu GaoXiang 215 Dec 30, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
Galileo library for large scale graph training by JD

近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提

JD Galileo Team 128 Nov 29, 2022
Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 181 Dec 27, 2022
A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial.

Streamlit Demo: Deep Dream A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial How to run this de

Streamlit 11 Dec 12, 2022
Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)

DQC: Differentiable Quantum Chemistry Differentiable quantum chemistry package. Currently only support differentiable density functional theory (DFT)

75 Dec 02, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

HOW TO USE THIS PROJECT A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets Based on DeepLabCut toolbox, we run wit

1 Jan 10, 2022
Official DGL implementation of "Rethinking High-order Graph Convolutional Networks"

SE Aggregation This is the implementation for Rethinking High-order Graph Convolutional Networks. Here we show the codes for citation networks as an e

Tianqi Zhang (张天启) 32 Jul 19, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

Facebook Research 81 Dec 29, 2022