A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Overview

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

The x-vector-vad system is described in the paper; Ogura, M. & Haynes, M. (2021) X-vector-vad for Multi-genre Broadcast Speech-to-text. The paper has been submitted to 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) and is currently under review as of June 2021.

Quickstart

$ docker pull bbcrd/bbc-speech-segmenter

# Test

$ docker run -w /wrk -v `pwd`:/wrk bbcrd/bbc-speech-segmenter ./test.sh

# Segmentation help

$ docker run bbcrd/bbc-speech-segmenter ./run-segmentation.sh --help
usage: run-segmentation.sh [options] input.wav input.stm output-dir

options:
  --nj NUM                 Maximum number of CPU cores to use
  --stage STAGE            Start from this stage
  --cluster-threshold THR  Cluster stopping criteria. Default: -0.3
  --vad-threshold THR      Xvector classifier threshold. Lower the number the
                           more speech segments shall be returned at the
                           expense of accuracy. Default: 0.2
  --vad-method             Filter segments on an individual or segment basis.
                           Default: individual
  --no-vad                 Skip xvector vad stages. Default: false
  --help                   Print this message

# Run segmentation (VAD + diarisation), results are in output-dir/diarize.stm

$ docker run -v `pwd`:/data bbcrd/bbc-speech-segmenter \
  ./run-segmentation.sh /data/audio.wav /data/audio.stm /data/output-dir

$ cat output-dir/diarize.stm
audio 0 audio_S00004 3.750 10.125 <speech>
audio 0 audio_S00003 10.125 13.687 <speech>
audio 0 audio_S00004 13.688 16.313 <speech>
...

# Train x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py data/bbc-vad-train/reference.stm            \
  data/bbc-vad-train/xvectors.ark new_model.pkl

# Evaluate x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py evaluate data/bbc-vad-eval/reference.stm    \
  data/bbc-vad-eval/xvectors.ark model/xvector-classifier.pkl

Audio & STM file format

In order to run the segmentation script you need your audio in 16Khz Mono WAV format. You also need an STM file describing the segments you want to apply voice activity detection and speaker diarization to.

For more information on the STM file format see XVECTOR_UTILS.md.

# Convert audio file to 16Khz mono wav

$ ffmpeg audio.mp3 -vn -ac 1 -ar 16000 audio.wav

# Create STM file for input

$ DURATION=$(ffprobe -i audio.wav -show_entries format=duration -v quiet -of csv="p=0")
$ DURATION=$(printf "%0.2f\n" $DURATION)

$ FILENAME=$(basename audio.wav)

$ echo "${FILENAME%.*} 0 ${FILENAME%.*} 0.00 $DURATION <label> _" > audio.stm

$ cat audio.stm
audio 0 audio 0.00 60.00 <label> _

Use Docker image to run code in local checkout

# Bulid Docker image

$ docker build -t bbc-speech-segmenter .

# Spin up a Docker container in an interactive mode

$ docker run -it -v `pwd`:/wrk bbc-speech-segmenter /bin/bash

# Inside a Docker container

$ cd /wrk/

# Run test

$ ./test.sh
All checks passed

Training and evaluation

X-vector utility

xvector_utils.py can be used to train and evaluate x-vector classifier, as well as o extract and visualize x-vectors. For more detailed information, see XVECTOR_UTILS.md.

The documentation also gives details on file formats such as ARK, SCP or STM, which are required to use this tool.

Run x-vector VAD training

Two files are required for x-vector-vad training:

  • Reference STM file
  • X-vectors ARK file

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py train \
  data/bbc-vad-train/reference.stm     \
  data/bbc-vad-train/xvectors.ark      \
  new_model.pkl

The model will be saved as new_model.pkl.

Run x-vector VAD evaluation

Three files are needed in order to run VAD evaluation:

  • Reference STM file
  • X-vectors ARK file
  • x-vector-vad classifier model

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py evaluate \
  data/bbc-vad-eval/reference.stm        \
  data/bbc-vad-eval/xvectors.ark         \
  model/xvector-classifier.pkl

WebRTC baseline

The code for the baseline WebRTC system referenced in the paper is available in the directory recipe/baselines/denoising_DIHARD18_webrtc.

Request access to bbc-vad-train

Due to size restriction, only bbc-vad-eval is included in the repository. If you'd like access to bbc-vad-train, please contact Matt Haynes.

Authors

Owner
BBC
Open source code used on public facing services, internal services and educational resources.
BBC
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking Datasets You can download datasets that have been pre-pr

25 May 29, 2022
Banglore House Prediction Using Flask Server (Python)

Banglore House Prediction Using Flask Server (Python) 🌐 Links 🌐 šŸ“‚ Repo In this repository, I've implemented a Machine Learning-based Bangalore Hous

Dhyan Shah 1 Jan 24, 2022
In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

cdf_att_classification classes = {0: 'cat', 1: 'dog', 2: 'flower'} In this project we use both Resnet and Self-attention layer for cdf-Classification.

3 Nov 23, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model SWAGAN: A Style-based Wavelet-driven Generative Model Rinon Gal, Dana

55 Dec 06, 2022
Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

Principled S2R Dehazing This repository contains the official implementation for PSD Framework introduced in the following paper: PSD: Principled Synt

zychen 78 Dec 30, 2022
Python utility to generate filesystem content for Obsidian.

Security Vault Generator Quickly parse, format, and output common frameworks/content for Obsidian.md. There is a strong focus on MITRE ATT&CK because

Justin Angel 73 Dec 02, 2022
DiSECt: Differentiable Simulator for Robotic Cutting

DiSECt: Differentiable Simulator for Robotic Cutting Website | Paper | Dataset | Video | Blog post DiSECt is a simulator for the cutting of deformable

NVIDIA Research Projects 73 Oct 29, 2022
PyTorch implementations of the beta divergence loss.

Beta Divergence Loss - PyTorch Implementation This repository contains code for a PyTorch implementation of the beta divergence loss. Dependencies Thi

Billy Carson 7 Nov 09, 2022
Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates Installation Clone the repository: git clone https://github.com/Zengyi-Qi

Zengyi Qin 3 Oct 18, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Cross View Transformers This repository contains the source code and data for our paper: Cross-view Transformers for real-time Map-view Semantic Segme

Brady Zhou 363 Dec 25, 2022
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
Official tensorflow implementation for CVPR2020 paper ā€œLearning to Cartoonize Using White-box Cartoon Representationsā€

Tensorflow implementation for CVPR2020 paper ā€œLearning to Cartoonize Using White-box Cartoon Representationsā€.

3.7k Dec 31, 2022
UNAVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring

UNAVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring Code Summary aggregate.py: this script aggr

1 Dec 28, 2021
PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning [Project Page] [Paper] Wenlong Huang1, Igor Mordatch2, Pieter Abbeel1,

Wenlong Huang 40 Nov 22, 2022
Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Understanding the Generalization Benefit of Model Invariance from a Data Perspective This is the code for our NeurIPS2021 paper "Understanding the Gen

1 Jan 15, 2022
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
Official code for "InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization" (ICLR 2020, spotlight)

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization Authors: Fan-yun Sun, Jordan Hoffm

Fan-Yun Sun 232 Dec 28, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 03, 2023