LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

The scripts for data preparation are available here.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: TBC !

LeBenchmark: a reproducible framework for assessing SSL from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

Code and data accompanying Natural Language Processing with PyTorch

DaCy: The State of the Art Danish NLP pipeline using SpaCy

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

DAGAN - Dual Attention GANs for Semantic Image Synthesis

The SVO-Probes Dataset for Verb Understanding

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

This project converts your human voice input to its text transcript and to an automated voice too.

vits chinese, tts chinese, tts mandarin

DeLighT: Very Deep and Light-Weight Transformers

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Unsupervised text tokenizer focused on computational efficiency

Code associated with the Don't Stop Pretraining ACL 2020 paper

Pipelines de datos, 2021.

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chinese Named Entity Recognization (BiLSTM with PyTorch)

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

SimCSE: Simple Contrastive Learning of Sentence Embeddings

A python wrapper around the ZPar parser for English.