A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Last update: Dec 28, 2022

Related tags

Deep Learning Pytorch-MBNet

Overview

Pytorch-MBNet

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Training

To train a new model, please run train.py, the input arguments are:

--data_path: The path of the directory containing all .wav files of VCC-2018 and the train/dev/test split files (the files in ./data).
--save_dir: The path of the directory to save the trained models. Please create the directory before training.
--total_steps: The total #training step in the training.
--valid_steps: Do the validation every #(valid_steps) of training update.
--log_steps: Log the tensorboard every #(log_steps) of training update.
--update_freq: Gradient accumulation, the default value is 1 (no accumulation).

Testing

To test on VCC-2018, please run test.py, the input arguments are:

--model_path: The path to the saved model.
--idtable_path: The path to the "judge id-number" mapping table file used during training.
--step: The time step for tensorboard log, which can be the same as the training steps.
--split: The valid/test split of data to be used in the testing.

Inference

After training on the VCC data, the model can be utilized to inference on other data. The input arguments are --data_path, --model_path, --save_dir, which are similar to the above. Notice that the bias-net is not used since in this code the ground-truth judge ids are assumed to be unavailable.

The pre-trained model can be found in ./pre_trained.

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Related tags

Overview

Pytorch-MBNet

Training

Testing

Inference

Owner

Unsupervised Pre-training for Person Re-identification (LUPerson)

AVD Quickstart Containerlab

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

Stochastic gradient descent with model building

Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Data Consistency for Magnetic Resonance Imaging

Bayesian Optimization Library for Medical Image Segmentation.

[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow

Official git repo for the CHIRP project

Neural Ensemble Search for Performant and Calibrated Predictions

Reinforcement Learning for the Blackjack