Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Last update: Jul 21, 2022

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode & KenLM ngram as a simple way to boost word error rate (WER).

Included is a file to create an ngram with KenLM as well as a simple evaluation script to compare the results of using Wav2Vec2 with ctcdecode + KenLM vs. without using any language model.

Note: The scripts are written to be used on GPU. If you want to use a CPU instead, simply remove all .to("cuda") occurances in eval.py.

Installation

In a first step, one should install KenLM. For Ubuntu, it should be enough to follow the installation steps described here. The installed kenlm folder should be move into this repo for ./create_ngram.py to function correctly. Alternatively, one can also link the lmplz binary file to a lmplz bash command to directly run lmplz instead of ./kenlm/build/bin/lmplz.

Next, some Python dependencies should be installed. Assuming PyTorch is installed, it should be sufficient to run pip install -r requirements.txt.

Run evaluation

Create ngram

In a first step on should create a ngram. E.g. for polish the command would be:

./create_ngram.py --language polish --path_to_ngram polish.arpa

After the language model is created, one should open the file. one should add a The file should have a structure which looks more or less as follows:

\data\        
ngram 1=86586
ngram 2=546387
ngram 3=796581           
ngram 4=843999             
ngram 5=850874              
                                                  
\1-grams:
-5.7532206      
   
       0
0       
         -0.06677356                                                                            
-3.4645514      drugi   -0.2088903
...

~~Now it is very important also add a~~ token to the n-gram so that it can be correctly loaded. You can simple copy the line:

0 -0.06677356

and change to . When doing this you should also inclease ngram by 1. The new ngram should look as follows:

\data\ ngram 1=86587 ngram 2=546387 ngram 3=796581 ngram 4=843999 ngram 5=850874 \1-grams: -5.7532206 0 0 -0.06677356 0 -0.06677356 -3.4645514 drugi -0.2088903 ...

Now the ngram can be correctly used with pyctcdecode

Run eval

Having created the ngram, one can run:

./eval.py --language polish --path_to_ngram polish.arpa

To compare Wav2Vec2 + LM vs. Wav2Vec2 + No LM on polish.

Results

==================================================polish================================================== polish - No LM - | WER: 0.3069742867206763 | CER: 0.06054530156286364 | Time: 32.37423086166382 polish - With LM - | WER: 0.39526828695550076 | CER: 0.17596985266474516 | Time: 62.017329692840576

I didn't obtain any good results even when trying out a variety of different settings for alpha and beta. Sadly there aren't many examples, tutorials or docs on parlance/ctcdecode so it's hard to find the reason for the problem.

Also tried it out for other languages like Portuguese and Spanish, but no luck there either.

Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Related tags

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

Installation

Run evaluation

Create ngram

Run eval

Results

Owner

Patrick von Platen

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

Two-stage text summarization with BERT and BART

Shared code for training sentence embeddings with Flax / JAX

CPC-big and k-means clustering for zero-resource speech processing

Code for Text Prior Guided Scene Text Image Super-Resolution

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

Athena is an open-source implementation of end-to-end speech processing engine.

Anomaly Detection 이상치 탐지 전처리 모듈

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Journey is a NLP-Powered Developer assistant

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

A Practitioner's Guide to Natural Language Processing

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

숭실대학교 컴퓨터학부 전공종합설계프로젝트

Adversarial Examples for Extreme Multilabel Text Classification

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization