Dataset and baseline code for the VocalSound dataset (ICASSP2022).

Overview

VocalSound: A Dataset for Improving Human Vocal Sounds Recognition

Introduction

VocalSound Poster

VocalSound is a free dataset consisting of 21,024 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. The VocalSound dataset also contains meta information such as speaker age, gender, native language, country, and health condition.

This repository contains the official code of the data preparation and baseline experiment in the ICASSP paper VocalSound: A Dataset for Improving Human Vocal Sounds Recognition (Yuan Gong, Jin Yu, and James Glass; MIT & Signify). Specifically, we provide an extremely simple one-click Google Colab script Open In Colab for the baseline experiment, no GPU / local data downloading is needed.

The dataset is ideal for:

  • Build vocal sound recognizer.
  • Research on removing model bias on various speaker groups.
  • Evaluate pretrained models (e.g., those trained with AudioSet) on vocal sound classification to check their generalization ability.
  • Combine with existing large-scale general audio dataset to improve the vocal sound recognition performance.

Citing

Please cite our paper(s) if you find the VocalSound dataset and code useful. The first paper proposes introduces the VocalSound dataset and the second paper describes the training pipeline and model we used for the baseline experiment.

@INPROCEEDINGS{gong_vocalsound,
  author={Gong, Yuan and Yu, Jin and Glass, James},
  booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition}, 
  year={2022},
  pages={151-155},
  doi={10.1109/ICASSP43922.2022.9746828}}
@ARTICLE{gong_psla, 
    author={Gong, Yuan and Chung, Yu-An and Glass, James},
    title={PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation}, 
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},  
    year={2021}, 
    doi={10.1109/TASLP.2021.3120633}
}

Download VocalSound

The VocalSound dataset can be downloaded as a single .zip file:

Sample Recordings (Listen to it without downloading)

VocalSound 44.1kHz Version (4.5 GB)

VocalSound 16kHz Version (1.7 GB, used in our baseline experiment)

(Mirror Links) 腾讯微云下载链接: 试听24个样本16kHz版本44.1kHz版本

If you plan to reproduce our baseline experiments using our Google Colab script, you do NOT need to download it manually, our script will download and process the 16kHz version automatically.

Creative Commons License
The VocalSound dataset is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Dataset Details

data
├──readme.txt
├──class_labels_indices_vs.csv # include label code and name information
├──audio_16k
│  ├──f0003_0_cough.wav # female speaker, id=0003, 0=first collection (most spks only record once, but there are exceptions), cough
│  ├──f0003_0_laughter.wav
│  ├──f0003_0_sigh.wav
│  ├──f0003_0_sneeze.wav
│  ├──f0003_0_sniff.wav
│  ├──f0003_0_throatclearing.wav
│  ├──f0004_0_cough.wav # data from another female speaker 0004
│   ... (21024 files in total)
│   
├──audio_44k
│    # same recordings with those in data/data_16k, but are no downsampled
│   ├──f0003_0_cough.wav
│    ... (21024 files in total)
│
├──datafiles  # json datafiles that we use in our baseline experiment, you can ignore it if you don't use our training pipeline
│  ├──all.json  # all data
│  ├──te.json  # test data
│  ├──tr.json  # training data
│  ├──val.json  # validation data
│  └──subtest # subset of the test set, for fine-grained evaluation
│     ├──te_age1.json  # age [18-25]
│     ├──te_age2.json  # age [26-48]
│     ├──te_age3.json  # age [49-80]
│     ├──te_female.json
│     └──te_male.json
│
└──meta  # Meta information of the speakers [spk_id, gender, age, country, native language, health condition (no=no problem)]
   ├──all_meta.json  # all data
   ├──te_meta.json  # test data
   ├──tr_meta.json  # training data
   └──val_meta.json  # validation data

Baseline Experiment

Option 1. One-Click Google Colab Experiment Open In Colab

We provide an extremely simple one-click Google Colab script for the baseline experiment.

What you need:

  • A free google account with Google Drive free space > 5Gb
    • A (paid) Google Colab Pro plan could speed up training, but is not necessary. Free version can run the script, just a bit slower.

What you don't need:

  • Download VocalSound manually (The Colab script download it to your Google Drive automatically)
  • GPU or any other hardware (Google Colab provides free GPUs)
  • Any enviroment setting and package installation (Google Colab provides a ready-to-use environment)
  • A specific operating system (You only need a web browser, e.g., Chrome)

Please Note

  • This script is slightly different with our local code, but the performance is not impacted.
  • Free Google Colab might be slow and unstable. In our test, it takes ~5 minutes to train the model for one epoch with a free Colab account.

To run the baseline experiment

  • Make sure your Google Drive is mounted. You don't need to do it by yourself, but Google Colab will ask permission to acess your Google Drive when you run the script, please allow it if you want to use Google Drive.
  • Make sure GPU is enabled for Colab. To do so, go to the top menu > Edit > Notebook settings and select GPU as Hardware accelerator.
  • Run the script. Just press Ctrl+F9 or go to runtime menu on top and click "run all" option. That's it.

Option 2. Run Experiment Locally

We also provide a recipe for local experiments.

Compared with the Google Colab online script, it has following advantages:

  • It can be faster and more stable than online Google Colab (free version) if you have fast GPUs.
  • It is basically the original code we used for our paper, so it should reproduce the exact numbers in the paper.

Step 1. Clone or download this repository and set it as the working directory, create a virtual environment and install the dependencies.

cd vocalsound/ 
python3 -m venv venv-vs
source venv-vs/bin/activate
pip install -r requirements.txt 

Step 2. Download the VocalSound dataset and process it.

cd data/
wget https://www.dropbox.com/s/c5ace70qh1vbyzb/vs_release_16k.zip?dl=0 -O vs_release_16k.zip
unzip vs_release_16k.zip
cd ../src
python prep_data.py

# you can provide a --data_dir augment if you download the data somewhere else
# python prep_data.py --data_dir absolute_path/data

Step 3. Run the baseline experiment

chmod 777 run.sh
./run.sh

# or slurm user
#sbatch run.sh

We test both options before this release, you should get similar accuracies.

Accuracy (%) Colab Script Open In Colab Local Script ICASSP Paper
Validation Set 91.1 90.2 90.1±0.2
All Test Set 91.6 90.6 90.5±0.2
Test Age 18-25 93.4 92.3 91.5±0.3
Test Age 26-48 90.8 90.0 90.1±0.2
Test Age 49-80 92.2 90.2 90.9±1.6
Test Male 89.8 89.6 89.2±0.5
Test Female 93.4 91.6 91.9±0.1
Model Implementation torchvision EfficientNet PSLA EfficientNet PSLA EfficientNet
Batch Size 80 100 100
GPU Google Colab Free 4X Titan 4X Titan
Training Time (30 Epochs) ~2.5 Hours ~1 Hour ~1 Hour

Contact

If you have a question, please bring up an issue (preferred) or send me an email [email protected].

Owner
Yuan Gong
Postdoc, MIT CSAIL
Yuan Gong
Minimal command-line music player written in Python

pyms Minimal command-line music player written in Python. Designed with elegance and minimalism. Resizes dynamically with your terminal. Dependencies

12 Sep 23, 2022
Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art!

osu-Extract Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art! Requirements python3 mutagen pillow Us

William Carter 2 Mar 09, 2022
This is a python package that turns any images into MIDI files that views the same as them

image_to_midi This is a python package that turns any images into MIDI files that views the same as them. This package firstly convert the image to AS

Rainbow Dreamer 4 Mar 10, 2022
Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.

LPC_for_TTS Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm. 基于Levinson-Durbin

Zewang ZHANG 58 Nov 17, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
This is my voice assistant Patric!

voice-assistant This is my voice assistant Patric! You can add can add commands and even modify his name Indice How to use Installation guide How to u

Norbert Gabos 1 Jun 28, 2022
Use android as mic/speaker for ubuntu

Pulse Audio Control Panel Platforms Requirements sudo apt install ffmpeg pactl (already installed) Download Download the AppImage from release page ch

19 Dec 01, 2022
Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

3 Feb 07, 2022
Small Python application that links a Digico console and Reaper, handling automatic marker insertion and tracking.

Digico-Reaper-Link This is a small GUI based helper application designed to help with using Digico's Copy Audio function with a Reaper DAW used for re

Justin Stasiw 10 Oct 24, 2022
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 498 Jan 06, 2023
commonfate 📦commonfate 📦 - Common Fate Model and Transform.

Common Fate Transform and Model for Python This package is a python implementation of the Common Fate Transform and Model to be used for audio source

Fabian-Robert Stöter 18 Jan 08, 2022
Noinoi music is smoothly playing music on voice chat of telegram.

NOINOI MUSIC BOT ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Resume, Stop feature Music & Video do

2 Feb 13, 2022
The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

Shankar Mahadevan L 1 Dec 03, 2021
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
All-In-One Digital Audio Workstation and Plugin Suite

How to install Windows Mac OS X Fedora Ubuntu How to Build Debian and Ubuntu Fedora All Other Linux Distros Mac OS X Windows What is MusiKernel? MusiK

j3ffhubb 111 Sep 21, 2021
Converting UGG files from Rode Wireless Go II transmitters (unsompressed recordings) to WAV format

Rode_WirelessGoII_UGG2wav Converting UGG files from Rode Wireless Go II transmitters (uncompressed recordings) to WAV format Story I backuped the .ugg

Ján Mazanec 31 Dec 22, 2022
An 8D music player made to enjoy Halloween this year!🤘

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023