Audio augmentations library for PyTorch for audio in the time-domain

Overview

Audio Augmentations

DOI

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1
> transformed_audio.shape[0] = 4 ">
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}
Comments
  • Delay augmentation on cuda

    Delay augmentation on cuda

    Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

    opened by sidml 2
  • Correctness unit test would be great

    Correctness unit test would be great

    For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

    opened by keunwoochoi 2
  • Default value of `max_snr` in `Noise`

    Default value of `max_snr` in `Noise`

    1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

    Related; it'd be great if one can hear some examples of the augmented result.

    opened by keunwoochoi 2
  • End-to-end PitchShift transform tests

    End-to-end PitchShift transform tests

    This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

    opened by Spijkervet 0
  • Unittests

    Unittests

    This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

    opened by Spijkervet 0
  • import error

    import error

    when i import torchaudio_augmentation

    I got the error

    RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

    how can I deal with it?

    opened by EavnJeong 0
  • Snr db

    Snr db

    Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

    • Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.
    • When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

    I hope it makes sens. I'd be glad to discuss further about that.

    opened by wesbz 0
  • sanity check for duration

    sanity check for duration

    In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

    opened by keunwoochoi 0
  • Shapes are still a bit confusing

    Shapes are still a bit confusing

    From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

    opened by keunwoochoi 3
Releases(v0.2.3)
Owner
Janne
Music producer, machine learning in MIR & occasional ethical hacker
Janne
Inner ear models for Python

cochlea cochlea is a collection of inner ear models. All models are easily accessible as Python functions. They take sound signal as input and return

98 Jan 05, 2023
We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

1 Nov 19, 2021
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
L-SpEx: Localized Target Speaker Extraction

L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen

Meng Ge 20 Jan 02, 2023
Converting UGG files from Rode Wireless Go II transmitters (unsompressed recordings) to WAV format

Rode_WirelessGoII_UGG2wav Converting UGG files from Rode Wireless Go II transmitters (uncompressed recordings) to WAV format Story I backuped the .ugg

Jรกn Mazanec 31 Dec 22, 2022
Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

IELab@ Korea University 0 Nov 12, 2021
In this project we can see how we can generate automatic music using character RNN.

Automatic Music Genaration Table of Contents Project Description Approach towards the problem Limitations Libraries Used Summary Applications Referenc

Pronay Ghosh 2 May 27, 2022
TwitterMusicBot - A Twitter bot with Spotify integration.

A Twitter Music Bot ๐Ÿค– ๐ŸŽต ๐ŸŽถ I created this project to learn more about APIs, so it only works for student purposes. Initially, delving into the Spoti

Gustavo Oliveira 2 Jan 02, 2022
An Amazon Music client for Linux (unpretentious)

Amusiz An Amazon Music client for Linux (unpretentious) โ†—๏ธ Install You can install Amusiz in multiple ways, choose your favorite. ๐Ÿš€ AppImage Here you

Mirko Brombin 25 Nov 08, 2022
Hide Your Secret Message in any Wave Audio File.

HiddenWave Embedding secret messages in wave audio file What is HiddenWave Hiddenwave is a python based program for simple audio steganography. You ca

TechChip 99 Dec 28, 2022
๐ŸŽต A music bot for discord servers!

music bot A music bot for Discord Servers Features Play songs in your discord server Get the lyrics without going on a web explorer Commands Command P

1 Jul 25, 2022
A rofi-blocks script that searches youtube and plays the selected audio on mpv.

rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block

Cliford 26 Dec 21, 2022
DCL - An easy to use diacritic library used for diacritic and accent manipulation.

Diacritics Library This library is used for adding, and removing diacritics from strings. Getting started Start by importing the module: import dcl DC

Kreus Amredes 6 Jun 03, 2022
Library for Python 3 to communicate with the Google Chromecast.

pychromecast Library for Python 3.6+ to communicate with the Google Chromecast. It currently supports: Auto discovering connected Chromecasts on the n

Home Assistant Libraries 2.4k Jan 02, 2023
Improved Python UI to convert Youtube URL to .mp3 file.

YT-MP3 Improved Python UI to convert Youtube URL to .mp3 file. How to use? Just run python3 main.py Enter the URL of the video Enter the PATH of where

8 Jun 19, 2022
Audio features extraction

Yaafe Yet Another Audio Feature Extractor Build status Branch master : Branch dev : Anaconda : Install Conda Yaafe can be easily install with conda. T

Yaafe 231 Dec 26, 2022
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
F.R.I.D.A.Y. ----- Female Replacement Intelligent Digital Assistant Youth

F.R.I.D.A.Y. Female Replacement Intelligent Digital Assistant Youth--Jarvis-- the virtual assistant made by python Overview This is a virtual assistan

JIB - Just Innovative Bro 4 Feb 26, 2022
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

Iver Jordal 1.2k Jan 07, 2023
Synthesia but open source, made in python and free

PyPiano Synthesia but open source, made in python and free Requirements are in requirements.txt If you struggle with installation of pyaudio, run : pi

DaCapo 11 Nov 06, 2022