UniSpeech - Large Scale Self-Supervised Learning for Speech

Overview

UniSpeech

The family of UniSpeech:

WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Update

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model Pretraining Dataset Finetuning Dataset Model
UniSpeech Large EN Labeled: 1350 hrs en - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it - download
Unispeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr - download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr 1 hr fr download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es 1 hr es download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it 1 hr it download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky 1 hr ky download
UniSpeech-SAT Base 960 hrs LibriSpeech - download
UniSpeech-SAT Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
UniSpeech-SAT Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
WavLM Base 960 hrs LibriSpeech - Azure Storage
Google Drive
WavLM Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive
WavLM Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive

Universal Representation Evaluation on SUPERB

alt text

Downstream Task Performance

We also evaluate our models on typical speaker related benchmarks.

Speaker Verification

Model Fix pre-train Vox1-O Vox1-E Vox1-H
ECAPA-TDNN - 0.87 1.12 2.12
HuBERT large Yes 0.888 0.912 1.853
Wav2Vec2.0 (XLSR) Yes 0.915 0.945 1.895
UniSpeech-SAT large Yes 0.771 0.781 1.669
WavLM large Yes 0.638 0.687 1.457
HuBERT large No 0.585 0.654 1.342
Wav2Vec2.0 (XLSR) No 0.564 0.605 1.23
UniSpeech-SAT large No 0.564 0.561 1.23
WavLM large No 0.431 0.538 1.154

Our paper for verification

Speech Separation

Evaluation on LibriCSS

Model 0S 0L OV10 OV20 OV30 OV40
Conformer (SOTA) 4.5 4.4 6.2 8.5 11 12.6
UniSpeech-SAT base 4.4 4.4 5.4 7.2 9.2 10.5
UniSpeech-SAT large 4.3 4.2 5.0 6.3 8.2 8.8
WavLM base+ 4.5 4.4 5.6 7.5 9.4 10.9
WavLM large 4.2 4.1 4.8 5.8 7.4 8.5

Speaker Diarization

Evaluation on CALLHOME

Model spk_2 spk_3 spk_4 spk_5 spk_6 spk_all
EEND-vector clustering 7.96 11.93 16.38 21.21 23.1 12.49
EEND-EDA clustering (SOTA) 7.11 11.88 14.37 25.95 21.95 11.84
UniSpeech-SAT large 5.93 10.66 12.9 16.48 23.25 10.92
WavLM Base 6.99 11.12 15.20 16.48 21.61 11.75
WavLm large 6.46 10.69 11.84 12.89 20.70 10.35

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

@inproceedings{Wang2021UniSpeech,
  author    = {Chengyi Wang and Yu Wu and Yao Qian and Kenichi Kumatani and Shujie Liu and Furu Wei and Michael Zeng and Xuedong Huang},
  editor    = {Marina Meila and Tong Zhang},
  title     = {UniSpeech: Unified Speech Representation Learning with Labeled and
               Unlabeled Data},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning,
               {ICML} 2021, 18-24 July 2021, Virtual Event},
  series    = {Proceedings of Machine Learning Research},
  volume    = {139},
  pages     = {10937--10947},
  publisher = {{PMLR}},
  year      = {2021},
  url       = {http://proceedings.mlr.press/v139/wang21y.html},
  timestamp = {Thu, 21 Oct 2021 16:06:12 +0200},
  biburl    = {https://dblp.org/rec/conf/icml/0002WQK0WZ021.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Chen2021WavLM,
  title   = {WavLM: Large-Scale Self-Supervised  Pre-training   for Full Stack Speech Processing},
  author  = {Sanyuan Chen and Chengyi Wang and Zhengyang Chen and Yu Wu and Shujie Liu and Zhuo Chen and Jinyu Li and Naoyuki Kanda and Takuya Yoshioka and Xiong Xiao and Jian Wu and Long Zhou and Shuo Ren and Yanmin Qian and Yao Qian and Jian Wu and Michael Zeng and Furu Wei},
  eprint={2110.13900},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}
@article{Chen2021UniSpeechSAT,
  title   = {UniSpeech-SAT: Universal Speech Representation Learning with  Speaker Aware Pre-Training},
  author  = {Sanyuan Chen and Yu Wu and Chengyi Wang and Zhengyang Chen and Zhuo Chen and Shujie Liu and   Jian Wu and Yao Qian and Furu Wei and Jinyu Li and  Xiangzhan Yu},
  eprint={2110.05752},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 02, 2023
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Project 3: Web APIs & NLP Problem Statement How do r/Libertarian and r/Neoliberal differ on Biden post-inaguration? The goal of the project is to see

Adam Muhammad Klesc 2 Mar 29, 2022
NL. The natural language programming language.

NL A Natural-Language programming language. Built using Codex. A few examples are inside the nl_projects directory. How it works Write any code in pur

2 Jan 17, 2022
A workshop with several modules to help learn Feast, an open-source feature store

Workshop: Learning Feast This workshop aims to teach users about Feast, an open-source feature store. We explain concepts & best practices by example,

Feast 52 Jan 05, 2023
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t

EleutherAI 6.7k Dec 28, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 07, 2022
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

60 Dec 12, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Tencent 633 Dec 28, 2022
Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded

Aditya 128 Dec 29, 2022
Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

rJAM splitscreen message reader for MysticBBS A46+

Robbert Langezaal 4 Nov 22, 2022
Share constant definitions between programming languages and make your constants constant again

Introduction Reconstant lets you share constant and enum definitions between programming languages. Constants are defined in a yaml file and converted

Natan Yellin 47 Sep 10, 2022
Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

TextCortex AI 27 Nov 28, 2022
Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

Recep YILDIRIM 1 Feb 15, 2022