Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Last update: Jan 03, 2023

Overview

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

April 2021: On-the-fly feature extraction from raw waveforms with torchaudio is supported. A LibriSpeech recipe is released here with no dependency on Kaldi and using YAML files (via Hydra) for configuring experiments.
June 2020: Transformer recipes released.
April 2020: Both E2E LF-MMI (using PyChain) and Cross-Entropy training for hybrid ASR are now supported. WSJ recipes are provided here and here as examples, respectively.
March 2020: SpecAugment is supported and relevant recipes are released.
September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

PyTorch version >= 1.5.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL
To install Espresso from source and develop locally:

git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
pip install kaldi_io sentencepiece soundfile
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. soundfile is required for reading raw waveform files. Kaldi is required for data preparation, feature extraction, scoring for some datasets (e.g., Switchboard), and decoding for all hybrid systems.

If you want to use PyChain for LF-MMI training, you also need to install PyChain (and OpenFst):

edit PYTHON_DIR variable in espresso/tools/Makefile (default: ~/anaconda3/bin), and then

cd espresso/tools; make openfst pychain

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu 
            and Shuoyang Ding and Hang Lv and Yiwen Shao 
            and Nanyun Peng and Lei Xie and Shinji Watanabe 
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Related tags

Overview

Espresso

What's New:

Requirements and Installation

License

Citation

Owner

Yiming Wang

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

spaCy plugin for Transformers , Udify, ELmo, etc.

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

profile tools for pytorch nn models

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Code repository for "It's About Time: Analog clock Reading in the Wild"

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

An extensive UI tool built using new data scraped from BBC News

2021 2학기 데이터크롤링 기말프로젝트

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

jiant is an NLP toolkit

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

A sentence aligner for comparable corpora

Club chatbot

A retro text-to-speech bot for Discord

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021