Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Last update: Dec 09, 2022

Related tags

Overview

VAD-SLI-ASR

Python scripts for a speech processing pipeline with Voice Activity Detection (VAD), Spoken Language Identification (SLI), and Automatic Speech Recognition (ASR). Our use case involves using VAD to detect time regions in a language documentation recording where someone is speaking, then using SLI to classify each region as either English (eng) or Muruwari (zmu), and then using an English ASR model to transcribe regions detected as English. This pipeline outputs an ELAN .eaf file with the following tier structure (_vad, _sli, and _asr):

Set up

pip install -r requirements.txt

Data

├── data
│   ├── sli-train      <- Training data for SLI (one folder per language)
│   │   ├── eng/       <- .wav files (English utterances)
│   │   ├── zmu/       <- .wav files (Muruwari utterances)
│   ├── asr-train      <- Intermediate data that has been transformed.
│   │   ├── eng.tsv    <- transcriptions
│   │   ├── eng/       <- .wav files (English utterances)

Usage

VAD

# VAD
python scripts/run_vad-by-silero.py myrecording.wav

SLI

# To train a classifier using your own clips and then save it:
python scripts/train_sli-by-sblr.py data/sli-train models/zmu-eng_sli_k10.pkl

# Use trained model to classify VAD-detected regions as eng or zmu
python scripts/run_sli-by-sblr.py models/zmu-eng_sli_k10.pkl myrecording.wav

ASR

# To fine-tune a wav2vec 2.0 model and save the checkpoint:
python scripts/train_asr-by-w2v2.py data/asr-train data/checkpoints/no-lm_b10

# Transcribe using trained model 
python scripts/run_asr-by-w2v2.py data/checkpoints/no-lm_b10 myrecording.wav

Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

3 Jan 5, 2022

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Releases(1.1.0)

1.1.0(Apr 23, 2022)
Switched to using pre-existing vocabulary from pre-trained model (see Appendix A in paper).

Source code(tar.gz)
Source code(zip)
1.0.0(Apr 18, 2022)

Source code(tar.gz)
Source code(zip)
0.9.0(Apr 14, 2022)

Pre-release to check Zenodo sync
Source code(tar.gz)
Source code(zip)

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Related tags

Overview

VAD-SLI-ASR

Set up

Data

Usage

VAD

SLI

ASR

You might also like...

Paddlespeech Streaming ASR GUI

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project converts your human voice input to its text transcript and to an automated voice too.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Every Google, Azure & IBM text to speech voice for free

Releases(1.1.0)

1.1.0(Apr 23, 2022)

1.0.0(Apr 18, 2022)

0.9.0(Apr 14, 2022)

Owner

Dynamics of Language

Chinese version of GPT2 training code, using BERT tokenizer.

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

Partially offline multi-language translator built upon Huggingface transformers.

Converts text into a PDF of handwritten notes

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Beyond the Imitation Game collaborative benchmark for enormous language models

Flaxformer: transformer architectures in JAX/Flax

Guide to using pre-trained large language models of source code

Rhyme with AI

Auto translate textbox from Japanese to English or Indonesia

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

Ecommerce product title recognition package

Write Python in Urdu - اردو میں کوڈ لکھیں

Must-read papers on improving efficiency for pre-trained language models.

Simple and efficient RevNet-Library with DeepSpeed support

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

A BERT-based reverse dictionary of Korean proverbs

Large-scale pretraining for dialogue

Spooky Skelly For Python

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.