Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Last update: Dec 01, 2022

Related tags

Overview

SEW (Squeezed and Efficient Wav2vec)

The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition" by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q Weinberger, and Yoav Artzi.

Model Checkpoints

Unsupervisedly Pre-trained on LibriSpeech 960h

Model	Pre-training updates	Dataset	Model
W2V2-tiny	100K	Librispeech 960h	download
W2V2-small	100K	Librispeech 960h	download
W2V2-mid	100K	Librispeech 960h	download
W2V2-base	100K	Librispeech 960h	download
SEW-tiny	100K	Librispeech 960h	download
SEW-small	100K	Librispeech 960h	download
SEW-mid	100K	Librispeech 960h	download
SEW-D-tiny	100K	Librispeech 960h	download
SEW-D-small	100K	Librispeech 960h	download
SEW-D-mid	100K	Librispeech 960h	download
SEW-D-mid (k127)	100K	Librispeech 960h	download
SEW-D-base	100K	Librispeech 960h	download
SEW-D-base+	100K	Librispeech 960h	download
SEW-D-mid	400K	Librispeech 960h	download
SEW-D-mid (k127)	400K	Librispeech 960h	download
SEW-D-base+	400K	Librispeech 960h	download

Usage

Dependencies

The code is tested with fairseq commit 05255f9, deberta commit bf17ca4 and the following packages.

torch==1.8.0
torchaudio==0.8.0
tqdm==4.49.0
Hydra==2.5
hydra-core==1.0.4
fvcore==0.1.5.post20210330
omegaconf==2.0.5
einops==0.3.0
fire==0.2.1

Apex

Please install NVIDIA's apex with

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

wav2letter decoder

Currently, we are decoding with wav2letter v0.2 python binding at commit 96f5f9d Please install the python binding here https://github.com/flashlight/wav2letter/tree/96f5f9d3b41e01af0a031ee0d2604acd9ef3b1b0/bindings/python The newest commit d5a93f0 in v0.2 branch leads to worse WER for wav2vec 2.0 baselines.

Installation

git clone https://github.com/asappresearch/sew.git
cd sew 
pip install -e .

Pre-training

Pre-training SEW models

Run the following command where $model_size can be tiny, small, or mid, and $ngpu is tne number of GPUs you want to use.

bash scripts/pt-sew.sh $model_size $ngpu

Pre-training SEW-D models

bash scripts/pt-sew-d.sh $model_size $ngpu

where $model_size can be tiny, small, mid, mid-k127, base, or base+.

Fine-tuning

Run the following script to fine-tune a model with the hyperparameters from wav2vec 2.0.

bash scripts/ft-model.sh $pre_trained_model $split $ngpu

where $pre_trained_model can be either a W2V2, SEW, or a SEW-D model checkpoint and $split can be 10m, 1h, 10h, or 100h.

Here we also provide a set of hyperparameters which sets all dropouts the same as the pre-training stage, and we found it to be more stable.

bash scripts/ft-model-stable.sh $pre_trained_model $split $ngpu

If you see out of GPU memory error, please scale down the dataset.max_tokens and scale up the optimization.update_freq in scripts/ft-model.sh. For example modifying these lines

  dataset.max_tokens=3200000 \
  optimization.update_freq="[$((8 / $ngpu))]" \

  dataset.max_tokens=1600000 \
  optimization.update_freq="[$((16 / $ngpu))]" \

which reduces the batch size and increases the gradient accumulation steps in order to use less GPU memory.

Evaluation

Please run this script to prepare the official LibriSpeech 4-gram language model.

bash scripts/prepare_librispeech_lm.sh $kenlm_build_bin

where $kenlm_build_bin is the folder that contains the KenLM build_binary executable file (e.g. /home/user/kenlm/build/bin).

Then run this script to evaluate a pre-trained ASR model

python tools/eval_w2v.py tunelm --subsets '["dev-clean", "dev-other", "test-clean", "test-other"]' --model $asr_checkpoint

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Comments

8000 sample rate audio

Hello there,

I'm trying to train on 8000 Hz sample rate audio dataset. Is it enough to simply add task.sample_rate=8000 to the fairseq command or there are additional config changes that I should make?

I would much appreciate any advice

Thank you

opened by Mega4alik 0
How to train using not English Languages

Hi! Thank you for the awesome model!

We are very interested in your project and we try to use the sew for Japanese Language. When we train the model, should we use these scripts? Thanks! https://github.com/asappresearch/sew/tree/master/scripts

opened by jigenji 1
:bug: Fix padding mask calculation

This PR updates the padding mask calculation to be the same as the one in the reference Wav2Vec2 implementation (same commit as listed in SEW's README): https://github.com/pytorch/fairseq/blob/05255f96410e5b1eaf3bf59b767d5b4b7e2c3a35/fairseq/models/wav2vec/wav2vec2.py#L477

For more details on how and why it was fixed in fairseq, check out this PR by @patrickvonplaten https://github.com/pytorch/fairseq/pull/3228

opened by anton-l 0

Releases(v0.0.1)

v0.0.1(Sep 15, 2021)

First release.
Source code(tar.gz)
Source code(zip)

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Related tags

Overview

SEW (Squeezed and Efficient Wav2vec)

Model Checkpoints

Unsupervisedly Pre-trained on LibriSpeech 960h

Usage

Dependencies

Apex

wav2letter decoder

Installation

Pre-training

Fine-tuning

Evaluation

You might also like...

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Code for our CVPR 2021 paper "MetaCam+DSCE"

Comments

8000 sample rate audio

How to train using not English Languages

:bug: Fix padding mask calculation

Releases(v0.0.1)

v0.0.1(Sep 15, 2021)

Owner

ASAPP Research

Image Restoration Using Swin Transformer for VapourSynth

Train CPPNs as a Generative Model, using Generative Adversarial Networks and Variational Autoencoder techniques to produce high resolution images.

The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SatelliteSfM - A library for solving the satellite structure from motion problem

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

A benchmark framework for Tensorflow

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

BRepNet: A topological message passing system for solid models

AntroPy: entropy and complexity of (EEG) time-series in Python

Object detection evaluation metrics using Python.

Code for binary and multiclass model change active learning, with spectral truncation implementation.

Deep learning PyTorch library for time series forecasting, classification, and anomaly detection

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

TensorFlow 2 AI/ML library wrapper for openFrameworks

MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

face property detection pytorch