Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Overview

MOSNet

pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352

Dependency

Linux Ubuntu 20.04

  • GPU: GeForce RTX 2080 Ti
  • CUDA version: 10.0

Python 3.7

  • pytorch==1.4.0
  • numpy==1.19.5
  • tqdm
  • scipy==1.6.2
  • pandas==1.2.4
  • matplotlib
  • librosa==0.6.0

Usage

Reproducing results in the paper

  1. cd ./data and run bash download.sh to download the VCC2018 evaluation results and submitted speech. (downsample the submitted speech might take some times)
  2. Run python mos_results_preprocess.py to prepare the evaluation results. (Run python bootsrap_estimation.py to do the bootstrap experiment for intrinsic MOS calculation)
  3. Run python utils.py to extract .wav to .h5
  4. Run python train.py -c config.json to train a CNN-BLSTM version of MOSNet.
  5. Run python test.py -c config.json --epoch BEST_EPOCH --is_fp16 to test a CNN-BLSTM version of MOSNet.

Note

Thanks to the authors of the paper MOSNet and the code is based on their tensorflow implementation https://github.com/lochenchou/MOSNet. However, my workstation will show OOM errors even with BATCH_SIZE=4 under tensorflow2.0 and RTX 2080 Ti. Therefore I implement the code with pytorch. Currently only 7700MiB memory is used when BATCH_SIZE=64. If you find any problem with my code, you can write a issue.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{mosnet,
  author={Lo, Chen-Chou and Fu, Szu-Wei and Huang, Wen-Chin and Wang, Xin and Yamagishi, Junichi and Tsao, Yu and Wang, Hsin-Min},
  title={MOSNet: Deep Learning based Objective Assessment for Voice Conversion},
  year=2019,
  booktitle={Proc. Interspeech 2019},
}

License

This work is released under MIT License (see LICENSE file for details).

VCC2018 Database & Results

The model is trained on the large listening evaluation results released by the Voice Conversion Challenge 2018.
The listening test results can be downloaded from here
The databases and results (submitted speech) can be downloaded from here

Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022
Calibrated Hyperspectral Image Reconstruction via Graph-based Self-Tuning Network.

mask-uncertainty-in-HSI This repository contains the testing code and pre-trained models for the paper Calibrated Hyperspectral Image Reconstruction v

JIAMIAN WANG 9 Dec 29, 2022
On Evaluation Metrics for Graph Generative Models

On Evaluation Metrics for Graph Generative Models Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham Taylor This is the offic

13 Jan 07, 2023
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 07, 2022
OpenMMLab Computer Vision Foundation

English | 简体中文 Introduction MMCV is a foundational library for computer vision research and supports many research projects as below: MMCV: OpenMMLab

OpenMMLab 4.6k Jan 09, 2023
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
SatelliteSfM - A library for solving the satellite structure from motion problem

Satellite Structure from Motion Maintained by Kai Zhang. Overview This is a libr

Kai Zhang 190 Dec 08, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 06, 2023
Continual Learning of Long Topic Sequences in Neural Information Retrieval

ContinualPassageRanking Repository for the paper "Continual Learning of Long Topic Sequences in Neural Information Retrieval". In this repository you

0 Apr 12, 2022
Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Towards Representation Learning for Atmospheric Dynamics (AtmoDist) The prediction of future climate scenarios under anthropogenic forcing is critical

Sebastian Hoffmann 4 Dec 15, 2022
Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Skeletal-GNN Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21 Various deep learning techniques have been propose

37 Oct 23, 2022
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022
Robust and Accurate Object Detection via Self-Knowledge Distillation

Robust and Accurate Object Detection via Self-Knowledge Distillation paper:https://arxiv.org/abs/2111.07239 Environments Python 3.7 Cuda 10.1 Prepare

Weipeng Xu 6 Jul 01, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
Code release for General Greedy De-bias Learning

General Greedy De-bias for Dataset Biases This is an extention of "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). T

4 Mar 15, 2022
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
An example of Scatterbrain implementation (combining local attention and Performer)

An example of Scatterbrain implementation (combining local attention and Performer)

HazyResearch 97 Jan 02, 2023
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python = 3.7 Pytorch

117 Jan 09, 2023